Now that we have reduced the dimension of our problem and selected the most important features, we can build a model that will predict the transfer fee of a given player.

For our model, we will use a neural network. A neural network is a machine learning model that attempts to mimic, at a very basic level, the human brain. It consists of layers of neurons, and through training, the model learns the weights between the neurons in consecutive layers. After training is complete, we can then feed the network any new samples, and it will use the weights that it learned from training to predict the correct output for the corresponding input (Path Mind).

The first step of creating the model is to split the data between training data and testing data. The training data will allow the model to learn the correct parameters so that it can be used on future data (Springboard). The testing data is data not included in the training data, so it essentially acts as new data to our model. Since we know the true output for the test data, we can use it to see how accurate our model is.

Next we want to normalize our data. We will be using a general purpose machine learning algorithm. But different problems that we try to solve with this model may have wildly different scales or units. Thus, in order to improve accuracy and reduce training time, we normalize all of our data.

Finally, we run the training by fitting the model to our training data set. After this completes, we can see the score of our model and use it to predict the transfer fees of our test set. The following code performs all of these outlined steps (Springboard):

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor

X_train, X_test, y_train, y_test = train_test_split(X, y)
scaler = StandardScaler()

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

mlp = MLPRegressor(hidden_layer_sizes=(25,25,25), max_iter=5000).fit(X_train, y_train)
print(mlp.score(X_train, y_train))

prediction = mlp.predict(X_test)

Our test set was limited to just 5 transfers. Due to the limited availability of free transfer data, we had to limit our analysis and model to the Premier League in the 2018-2019 season. After removing all transfers that were loans or where the players didn’t play any match minutes, we were left with just 28 transfers. 23 of these were used to train the model, and 5 were used to test the model. The following chart shows the results of the test set predictions:

As shown in this chart, our model was quite accurate in predicting the transfer fee of Eden Hazard, but it underpriced all other transfers by about half.

The primary weakness of our model is its simplicity. It only considers game statistics, and not the multitude of other factors that go into signing a player: fandom, marketing potential, merchandise sales, and more. The goal of acquiring a player through the transfer market is not simply to pay for goals and assists. It could be to grow the brand or the club in a multitude of other ways. Furthermore, machine learning models benefit greatly from a large training data set. The more examples that the model sees, the better it can learn an accurate relationship between game statistics and transfer fee.

That being said, our model indicates that Eden Hazard very likely was worth the full 90 million Euros that was spent on him. Lukaku, on the other hand, was overpriced. The productivity gained from acquiring Lukaku is not worth his transfer fee, although the club may not regret the transfer for other reasons as outlined above.


A Beginner’s Guide to Neural Networks in Python