Voyage records in the Trans-Atlantic Database include the ports where enslaved people embarked and disembarked but without the day-to-day geographic coordinates. For those voyages for which we could not find a correspondence in the CLIWOC database, we used differential equations and recurrent neural network to predict voyage paths.
The motivation for us to model with differential equations was the missing day-to-day geographical data in the Trans-Atlantic Slave Trade Database. With differential equations, we expected to generate a function for which we could just input the initial longitude and latitude and get the full voyage path. Estimating wind speed and ship speed from the CLIWOC database and using the linear speed formula, we established the following differential equation, with Vw as wind speed and Vs as ship speed (including actual ship speed, current resistance, and air resistance)
Due to inaccurate input and the complexities of each voyage, this differential equation failed to accurately predict the voyage paths. It is impossible to accurately predict a voyage path with unexpected weather and the interference of many other factors with just one pair of input. In order to make the prediction more accurate, we needed to analyze the day-to-day geographical data of the voyages with complete geographical records, and apply the pattern of these data to voyages with missing data. This led us to another prediction method, recurrent neural network.
Recurrent neural network (RNN)
Each path is time-series data since the time interval influences the next position of the ship. Long short-term memory (LSTM) is an RNN architecture well-suited to processing time-series data and we used it to learn from the cleaned voyage paths obtained from the CLIWOC database to make predictions for voyages in the Trans-Atlantic Database.
For each path of length n, the input has 5 dimensions: current longitude and latitude, Δt which is the number of days changed, and the longitude and latitude of the endpoint. Figure 1 indicates our network architecture. The initial input is the start point, Δt, and endpoint, and we use the predicted longitude and latitude as part of the next input.
We introduce another parameter , which controls the proportion of the existing data we want to use as input. is between 0 and 1. The reason we include is that if we only use the predicted longitude and latitude as the input except the first one at the beginning, we will get a high cumulative error. can help us to deal with this situation. When is large, we have a high probability of using and as input, which adjusts the error made before this point. And when is small, it is more likely using and as input. In the training set, we gradually decrease the value of and use more predicted values. And in the test set, we simply let equal to 0. Other algorithms such as odenet can also help to deal with this problem.
Training and Testing
We collected 855 voyage paths from the CLIWOC database and use 80% of them as the training set and the remaining 20% as the test set. We decay the value of from 1 to 0.1, and the test error is about 9.04.
We used the trained model to predict 2,729 trans-Atlantic voyage paths since this subset has the specific geographic coordinates of start and end points as well as the days spent during the Middle Passage. For simplicity, we set Δt to be 1. Figure 2(a) shows 2,164 predicted voyage paths where the end locations are in the northern hemisphere. Figure 2(b) visualizes 36 predicted paths, all of which have a reasonable smooth line between the start and end points.