In our data collection and analysis, we often had to make assumptions about countries and how they should be treated. This study does not attempt to become too deeply engrossed in discussions of borders or sovereignty, so assumptions that could simplify the analysis and allow broad, general trends about women’s soccer success to be seen were often made.
One major assumption was how to treat the teams from the United Kingdom. England, Ireland, Northern Ireland, Scotland and Wales all compete separately and are ranked in the top 100 in FIFA (although they compete together at the Olympics, with the exception of Ireland). However, most rankings that we found group the United Kingdom together, reporting it as one nation. We decided to use the scores assigned to the United Kingdom for all nations for every metric except for population. For example, they were all assigned an Economic Freedom ranking of 78.9, the score assigned to the United Kingdom by the report that we found. We think that the states within the UK generally resemble one another, and while there are certainly small differences between each state, the trends in these categories (level of education, gender gap, etc.) should be largely the same from state to state. Furthermore, this allowed us to include all of the teams in our analysis, rather than being forced to exclude four of the top 100 teams, since there is no data available on the individual states in the categories that we were concerned with.
On the other hand, we chose not to extend metrics from the United States to Guam or from China to Chinese Taipei and Hong Kong (all of these teams are ranked in the Top 100). Without diving too deep into the topic, we decided that these particular relationships were not as close as in the case of the United Kingdom. At a very surface level, we decided that we did not feel comfortable characterizing these states as similar enough to one another to use the same metrics on the aforementioned categories for both states. Because of this, we were forced to not include these teams from our analysis when they did not individually appear in the studies that we used.
Additionally, we were forced to make a decision about what to do when different variables had available information on different countries. For example, the Women’s Rights Index had information available on only 74 of the countries, whereas we were able to find population data for all 100 countries, and the Gender Gap Index included 95 of the 100 countries that we were interested in. It could skew the data to have different countries included in the dataset for each variable. However, we decided to forge ahead with whatever countries had available information in each category, even if that meant different countries for some variables. The goal of the project was to determine which variables positively correlate with women’s soccer success, and we believe that to do this, it is best to include whatever information is available, even if that could lead to slight skews from variable to variable.
In addition to these assumptions, we have come up with some other factors that could potentially lead to error in our report, or that should just be generally noted when considering the validity of the results. One area of particular concern is the arbitrariness of the rankings. All rankings and indexes that we used have some degree of arbitrariness, but the FIFA soccer rankings seem to be particularly arbitrary. They can vary widely in the span of a short time and depend on the results of matches that can often be very arbitrary themselves. This is exacerbated by the fact that there are often very few international matches over the course of a very long time. However, this was the best metric that we could find for measuring success of a soccer team in a nation.
Another potential cause of error is that we only analyzed the top 100 women’s soccer teams. Due to time constraints and the labor intensive nature of collecting data, we chose to focus on just the top 100 ranked women’s soccer teams, comparing them along all of the factors discussed in this report. While this should not significantly alter the data, it is important to note that any countries outside of the top 100 are not factored into the analysis.
Additionally, there is some concern about the data appearing more significant than it is. First of all, the data is only showing correlation and never proving causation. It can also sometimes show that a metric appears to be correlated with something but it may really just be a proxy for another factor. This is of particular concern given that many of our variables often positively correlate with one another. But we attempt to address this by evaluating the interaction terms, and just by noting that any results must be carefully considered before jumping to conclusions.
Despite all of the simplifying assumptions and potential causes of error, we still feel very confident about the data that we collected. The goal of the project was to determine which various factors influence success of women’s national teams and we believe that our data is very well suited to allow us to draw conclusions about this.