As thoroughly outlined in the Feature Selection landing page, developing a predictive method for the Transfer Fee of a player is an intensive process. The most impactful features in determining a player’s worth included the following:
goals_involved_per_90_overall assists_per_90_overall goals_per_90_overall
This regression was able to identify the most valuable statistics on the transfer market; however, it is a well-known phenomenon that statistics often tell an incomplete story of a soccer match. Reference this blog post for more information regarding The Difficulty of Statistically Analyzing Match Performance.
In search of a more data-driven approach, the 36 player attributes incorporated into FIFA 20 can be used as a foil for real-life professional soccer statistics. Depending on a player’s position, his or her weighted average rating is calculated based on a unique distribution of these 36 metrics.
(Another interesting component of this report was the position-wise regression of transfer fees. These predictive models calculated using RStudio were able to identify certain statistics as more lucrative relative to the position of the player earning them. For example, we can see that the prediction of a forward’s transfer fee can be calculated by utilizing the equation:
Transfer Fee (Million £) = 8.78 * Goals + 7.84 * Assists - .027 * Minutes Played + 8.37
This was the most significant regression, but the midfielders’ results were interesting in that assists were valued significantly higher than goals. Finally, the defenders’ results indicated an inverse correlation between goals conceded and transfer fees – also very reasonable.)
In addition to calculating a player’s overall rating, these 36 attributes can also serve as the data set behind predicting a player’s worth. The very Machine Learning principles that were applied to the official soccer statistics can be used in the same ways for the FIFA data.
The following code was used to print out the 20 features identified by univariate selection:
bestfeatures = SelectKBest(score_func=chi2, k=10) fit = bestfeatures.fit(X,y) dfscores = pd.DataFrame(fit.scores_) dfcolumns = pd.DataFrame(X.columns) featureScores = pd.concat([dfcolumns,dfscores],axis=1) featureScores.columns = ['Specs','Score'] #naming the dataframe columns print(featureScores.nlargest(20,'Score')) #print 20 best features
Specs Score 11 Gk Reflexes 2566.316481 9 Finishing 1829.106330 18 Marking 1721.851206 26 Sliding_Tackle 1571.904722 32 Volleys 1559.240133 29 Standing_Tackle 1520.755150 17 Long_Shots 1365.197172 10 Free_Kick_Accuracy 1280.670941 21 Positioning 1227.818970 8 Dribbling 1227.013862 14 Interceptions 1148.642781 6 Crossing 1011.847402 7 Curve 986.739920 25 Shot_Power 882.620074 20 Penalties 840.008643 12 Heading_Accuracy 836.486168 5 Ball_Control 793.621343 31 Vision 746.930722 24 Short_Passing 653.089402 2 Aggression 637.132533
Next, the Feature Importance protocol was initiated to visualize the most deterministic features on transfer fees:
model = ExtraTreesClassifier() model.fit(X,y) print(model.feature_importances_) #use inbuilt class feature_importances of tree based classifiers #plot graph of feature importances for better visualization feat_importances = pd.Series(model.feature_importances_, index=X.columns) feat_importances.nlargest(10).plot(kind='barh') plt.show()
Finally, a correlation heat map was created to identify not only how the attributes relate to the transfer fee of a player, but also how each attribute relates to one another.
corrmat = df.corr() top_corr_features = corrmat.index plt.figure(figsize=(100,100)) g=sns.heatmap(df[top_corr_features].corr(),annot=True,cmap="RdYlGn")
A high-quality version of this map is available for download here.
It is important to note that these FIFA metrics were merged with the players’ real-life transfer data. As a result, this regression sought out correlations between virtual players’ attributes and their actual monetary worth. While this may seem inconsequential in practice, it is a very interesting idea to explore. The analysis of the real-world statistics found goals and assists to be the most valuable; likewise, the FIFA data determined that “finishing” (the attribute most inline with goal-scoring) was by far the most significant, followed by other attacking stats such as free-kick accuracy, volleys, and crossing. It can also be seen in the heat map that many of these offensive attributes have a high correlation with one another. These findings reinforce the narrative that forwards and attacking midfielders demand the most ransom on the transfer market. It is then up to the individual managers to decide whether to splurge on expensive talent or seek out cheaper options through the farm system.