Football managers like to say it’s a game of two halves; eleven men versus eleven men where anything can happen. But if that’s the case, why are teams like Brazil and Germany consistent winners of tournaments? What makes teams like Spain, Italy and France continual favourites?
We believe the answer lies in data. With enough data points and enough analysis of those data points, it is perhaps possible to predict the winner of the World Cup based on data alone.
We decided to give it a go, but only on a small scale. We’ll explain our process below, and you can play with the data yourself to draw your own conclusions.
Six things we found in World Cup historic data
1. The winner may come from outside Europe, but…
Up until 2000, the only team to have won a World Cup outside its own continent was Brazil. They won in Sweden in 1954 and again in the USA in 1994 – although Brazil is effectively in the same hemisphere as USA, practically the same time zone, and they only won on penalties after a bore draw final, thanks to Roberto Baggio uncharacteristically missing a penalty. In the 21st century, three of the last four World Cups have been won by teams outside of their own continent – Germany won in Brazil, Spain won in South Africa and Brazil won in Japan.
But, of all the times the competition has been hosted in Europe, only once has a non-European country won.
European teams are on a roll – they are the ones that have been winning outside of their own continent (Spain and Germany, as I’ve mentioned).
2. Germany are not on a good run
2014 World Cup winners Germany have not enjoyed a great build up to this year’s World Cup in Russia, winning only once in the last six games – and that was against Saudi Arabia. Previous campaigns in the lead up to a World Cup have gone much smoother for the Germans. Will a lacklustre run of games have a negative effect on the reigning champions or is it going to be business as usual for Joachim Löw?
3. Argentina has the best goals per cap record
Using data on all the caps across each 23-man squad, and all their international goals, we can calculate the average goals per cap across each team. Using this metric, Argentina’s current squad has the best goal-scoring record over time, whereas Egypt has the worst.
4. 74% of the players at this year’s World Cup play their club football in Europe.
This includes four from Brazil who play at Manchester City together. Manchester City have supplied the most players for this year’s tournament with 16 of their squad travelling to Russia.
5. Hosts Russia are the lowest-ranked national team at this World Cup
The home team’s goal-scoring average at World Cups has been declining, and they are currently 70th in the FIFA/Coca-Cola World Rankings. Will a home advantage give Russia a competitive edge this time?
6. Iceland has a home advantage in Russia
Three of Iceland’s team play club football in Russia, as do three from Iran, two from Sweden and one from Poland. We gave proximity points to each player, based on their home club’s proximity to Russia. This helps us calculate a home advantage score for each team.
What data would help us analyse the World Cup winner?
To build a model capable of accurately predicting the winner, we would need big data. The result of every match played by every team, not only at World Cup Finals, but also qualification games, friendlies and matches at other tournaments.
We would need to take into account things like location of those games, how far the teams travelled, how far into the season the games took place, perhaps the ranking of the opponents.
We would also need big data relating to all the players in all the games. Which players were in the games? How was their form at the time? How had those players performed historically before every match?
The location of a World Cup Finals tournament could also affect team performance – how far they travel to each match, how far they are away from home, how many days there are between matches.
Having every available data point would require some algorithmic analysis, and machine learning to develop models for predicting outcomes. We’d ideally test the results of our model’s predictions, so we could train it to get better over time.
Historic World Cup analysis – comparing the teams
We don’t have all that data, and we don’t have time to train a model to the extent we would like, but nevertheless we decided to do some analysis on a smaller scale and have some fun with data visualisation and insight.
We chose to focus only on historic matches played in World Cup Finals by the teams competing in this year’s World Cup. This means we have some dark horses. Iceland and Panama have never qualified before, so they have no history of matches played. Logic would suggest that a first timer is unlikely to win anyway – other than Uruguay, who won the first ever tournament in 1934 on home soil.
Here is the process we followed for this experiment
- We listed the results for each game played at previous World Cup Finals by teams competing in this year’s tournament. Iceland and Panama have not played in a World Cup Finals tournament prior to 2018.
- We listed all 23-man squads for each competing team, along with their caps and goals scored for their country, as well as their age and the club they play for.
- For each player, we calculated the average goals scored per cap, so that we could calculate this average for the whole team. This creates a unifying comparison metric for comparing team experience with goals scored over time.
- For each player, we created a proximity score based on how close to Russia they play club football. If they play in Russia, they get 20 points; if they play in a country neighbouring Russia, they get 15. For clubs within Europe or close to Russia’s time zone, it’s 10 points, for clubs much further away, like the Americas, it’s five points, while clubs round the other side of the world get two points.
- We also created a proximity score based on the continent that each player plays in.
- Proximity scores for all players are then added together to calculate a home advantage rating for each national team.
- Using these data sets, we then created multiple charts that let us view stats based on where players play, how much experience they have, the team’s history at previous finals and more.
See if you can predict the winner
Use our interactive report to see if you can predict a winner based on the available data.