Soccer is one of the most difficult sports to analyze from a statistical perspective. Over the past decade, data analytics has taken the sports world by storm. This is largely because sports fans have become more accustomed to advanced data analysis, and the tools available to analysts have exploded in the last 10 years. From the book Moneyball to mobile apps that track your basketball shooting form, technology and data collection have become an important tool in every sports fan’s bag of arguments. Despite this, soccer has been one of the slowest sports to gain a huge following from the stats analysts. But there’s a good reason for that: statistically analyzing soccer is far more difficult than in other sports.
Fundamentally, the biggest reasons why analyzing soccer is so hard is because it is a low scoring sport and there just aren’t a lot of data categories you can look at from a soccer player or game. This is in huge contrast with baseball, which has so many statistical categories it is impossible to know them all. As a result, baseball was one of the first sports to adopt techniques of statistical analysis, and it was popularized in the book and movie Moneyball. As any soccer fan knows, the low scoring nature of the sport often results in matches where the performance of the team does not align with the scoreline. One team could perform significantly better and dominate, despite the match ending in a scoreless draw. Even knowing this, the problem remains difficult. With soccer being a more team-oriented sport than others like baseball and basketball, there aren’t many metrics we can use to actually quantify success.
Let’s look at how the most popular website for data analytics, FiveThirtyEight, makes their soccer predictions. This blog post describes the metrics they use and how they compute club soccer rankings and predictions. They publish an SPI rating (Soccer Power Index) for every club team, as well as match performances, match predictions, season predictions, and relative league strengths. In this post we’ll look at how they quantify match performances and discuss a few possibilities for improvement.
As explained in their blog post, FiveThirtyEight uses 3 main metrics for match performance: adjusted goals, shot-based expected goals, and non-shot expected goals. Essentially, these are 3 measures of either how many goals the team scored or should have scored. The site uses an average of these metrics for the team’s offensive performance, and an average of the opposing team’s metrics for their defensive performance. The site includes more details on how the 3 metrics are specifically computed in their blog post.
Although these 3 metrics do a good job of adjusting the goals or potential goals based on match conditions, a team’s overall performance is not solely encapsulated by goals, and a team’s overall defensive performance is not solely encapsulated by preventing goals. We can improve the SPI by considering time of possession and 1 new offensive and defensive metric each: passes completed and tackles won. We include time of possession to solve for the previously mentioned fundamental problem with soccer statistics: dominating performances where the scoreline doesn’t match. In these situations, however, we would expect the dominating team to dominate time of possession, despite not being able to put the ball in the back of the net. To account for time of possession, we weight the SPI offensive and defensive scores. A higher time of possession would increase a team’s offensive SPI, and a lower time of possession would decrease a team’s defensive SPI.
We now include 1 additional offensive metric: passes completed. While the number of passes completed will likely correlate with time of possession and goals scored, it represents the key idea of playing successful team soccer. Including this metric will cover another class of performances where match conditions may not favor a high time of possession or goals scored (e.g. high pressure, fast paced matches). Although the goal of this is to create an objective metric of match performance, if we subjectively look back on the best offensive teams and performances, they tend to be able to move the ball accurately and quickly (think Barcelona or Brazil). To adjust for a team’s style, we can compute this metric relative to the team’s average. Thus, if a team completes more passes than their average over the last season, then their offensive performance was likely higher that game and we increase their SPI offensive metric.
Similarly, we include 1 additional defensive metric: tackles completed. This article, in an analysis of defensive statistics in the MLS, finds that tackles won is the most correlated with stopping goals and winning games. Thus, for the same reasons we include passes completed in the offensive metric, we include tackles completed in the defensive SPI metric. Intuitively, we want to separate as best we can the dependence between offensive performance and defensive performance. If a team is not performing offensively, this will result in an onslaught to the defense. There will be a very low time of possession, and the opposition will have a lot of opportunities, shots, and goals. But this doesn’t necessarily mean that their defense is any worse than the other team. It was mostly a result of lackluster offense. Imagine a team that is being dominated but only gives up 1 or 2 goals. It’s likely that this additional metric, tackles completed, is high, and will help adjust the defensive performance and mitigate the effect of the low offensive performance. Finally, just like with passes completed, we compute this metric relative to the team’s average over the last season.
Despite the inherent difficulty of measuring soccer performance, work is being done in academia and on popular sports blogs like FiveThirtyEight. With more sensors and GPS trackers being used today, the amount of data available to researchers is increasing exponentially. FiveThirtyEight’s match performance metric, SPI, takes into consideration and adjusts for various match conditions. However, it is too focused solely on goals scored and goals against. By including time of possession, passes completed, and tackles completed, we can get a much better objective picture of how a team performed both offensively and defensively in a given match.
Sources:
https://www.americansocceranalysis.com/home/2014/04/28/individual-defensive-statistics-which-ones-matter-and-top-10-mls-defenders
Some interesting soccer papers from the MIT Sloan Sports Analytics Conference covering comprehensive team performance, goalkeeping, and player mentality, respectively:
http://www.sloansportsconference.com/wp-content/uploads/2019/02/Decomposing-the-Immeasurable-Sport.pdf
http://www.sloansportsconference.com/wp-content/uploads/2019/02/Data-Driven-Goalkeeper-Evaluation-Framework-1.pdf
http://www.sloansportsconference.com/wp-content/uploads/2019/02/Choke-or-Shine-Quantifying-Soccer-Players-Abilities-to-Perform-Under-Mental-Pressure.pdf
It seems as though many of the contributions being made in the literature have to do with coming up with metrics to measure different aspects of the game besides traditional statistics such as goals, passes, and tackles.