FIFA 20 Data

As thoroughly outlined in the Feature Selection landing page, developing a predictive method for the Transfer Fee of a player is an intensive process. The most impactful features in determining a player’s worth included the following:


This regression was able to identify the most valuable statistics on the transfer market; however, it is a well-known phenomenon that statistics often tell an incomplete story of a soccer match. Reference this blog post for more information regarding The Difficulty of Statistically Analyzing Match Performance.

In search of a more data-driven approach, the 36 player attributes incorporated into FIFA 20 can be used as a foil for real-life professional soccer statistics. Depending on a player’s position, his or her weighted average rating is calculated based on a unique distribution of these 36 metrics.

(Another interesting component of this report was the position-wise regression of transfer fees. These predictive models calculated using RStudio were able to identify certain statistics as more lucrative relative to the position of the player earning them. For example, we can see that the prediction of a forward’s transfer fee can be calculated by utilizing the equation:

Transfer Fee (Million £) = 8.78 * Goals + 7.84 * Assists - .027 * Minutes Played + 8.37

This was the most significant regression, but the midfielders’ results were interesting in that assists were valued significantly higher than goals. Finally, the defenders’ results indicated an inverse correlation between goals conceded and transfer fees – also very reasonable.)


In addition to calculating a player’s overall rating, these 36 attributes can also serve as the data set behind predicting a player’s worth. The very Machine Learning principles that were applied to the official soccer statistics can be used in the same ways for the FIFA data.

The following code was used to print out the 20 features identified by univariate selection:

bestfeatures = SelectKBest(score_func=chi2, k=10)
fit =,y)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)

featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']  #naming the dataframe columns
print(featureScores.nlargest(20,'Score'))  #print 20 best features


                 Specs        Score
11         Gk Reflexes  2566.316481
9            Finishing  1829.106330
18             Marking  1721.851206
26      Sliding_Tackle  1571.904722
32             Volleys  1559.240133
29     Standing_Tackle  1520.755150
17          Long_Shots  1365.197172
10  Free_Kick_Accuracy  1280.670941
21         Positioning  1227.818970
8            Dribbling  1227.013862
14       Interceptions  1148.642781
6             Crossing  1011.847402
7                Curve   986.739920
25          Shot_Power   882.620074
20           Penalties   840.008643
12    Heading_Accuracy   836.486168
5         Ball_Control   793.621343
31              Vision   746.930722
24       Short_Passing   653.089402
2           Aggression   637.132533

Next, the Feature Importance protocol was initiated to visualize the most deterministic features on transfer fees:

model = ExtraTreesClassifier(),y)
print(model.feature_importances_) #use inbuilt class feature_importances of tree based classifiers
#plot graph of feature importances for better visualization
feat_importances = pd.Series(model.feature_importances_, index=X.columns)



Finally, a correlation heat map was created to identify not only how the attributes relate to the transfer fee of a player, but also how each attribute relates to one another.

corrmat = df.corr()
top_corr_features = corrmat.index


A high-quality version of this map is available for download here.



It is important to note that these FIFA metrics were merged with the players’ real-life transfer data. As a result, this regression sought out correlations between virtual players’ attributes and their actual monetary worth. While this may seem inconsequential in practice, it is a very interesting idea to explore. The analysis of the real-world statistics found goals and assists to be the most valuable; likewise, the FIFA data determined that “finishing” (the attribute most inline with goal-scoring) was by far the most significant, followed by other attacking stats such as free-kick accuracy, volleys, and crossing. It can also be seen in the heat map that many of these offensive attributes have a high correlation with one another. These findings reinforce the narrative that forwards and attacking midfielders demand the most ransom on the transfer market. It is then up to the individual managers to decide whether to splurge on expensive talent or seek out cheaper options through the farm system.