Can't find what you are looking for?

Email us at info@vertical-leap.uk or call us on 023 9283 0281. Alternatively, fill out the form on our contact us page.

Can data predict the winner of the World Cup?

Categories: Data & analytics

Football managers like to say it’s a game of two halves; eleven men versus eleven men where anything can happen. But if that’s the case, why are teams like Brazil and Germany consistent winners of tournaments? What makes teams like Spain, Italy and France continual favourites?

We believe the answer lies in data. With enough data points and enough analysis of those data points, it is perhaps possible to predict the winner of the World Cup based on data alone.

We decided to give it a go, but only on a small scale. We’ll explain our process below, and you can play with the data yourself to draw your own conclusions.

Six things we found in World Cup historic data

1. The winner may come from outside Europe, but…

Up until 2000, the only team to have won a World Cup outside its own continent was Brazil. They won in Sweden in 1954 and again in the USA in 1994 – although Brazil is effectively in the same hemisphere as USA, practically the same time zone, and they only won on penalties after a bore draw final, thanks to Roberto Baggio uncharacteristically missing a penalty. In the 21st century, three of the last four World Cups have been won by teams outside of their own continent – Germany won in Brazil, Spain won in South Africa and Brazil won in Japan.

But, of all the times the competition has been hosted in Europe, only once has a non-European country won.

Europe winners of World Cups

European teams are on a roll – they are the ones that have been winning outside of their own continent (Spain and Germany, as I’ve mentioned).

2. Germany are not on a good run

2014 World Cup winners Germany have not enjoyed a great build up to this year’s World Cup in Russia, winning only once in the last six games – and that was against Saudi Arabia. Previous campaigns in the lead up to a World Cup have gone much smoother for the Germans. Will a lacklustre run of games have a negative effect on the reigning champions or is it going to be business as usual for Joachim Löw?

World Cup Germany build up

3. Argentina has the best goals per cap record

Using data on all the caps across each 23-man squad, and all their international goals, we can calculate the average goals per cap across each team. Using this metric, Argentina’s current squad has the best goal-scoring record over time, whereas Egypt has the worst.

World Cup average goals per cap

4. 74% of the players at this year’s World Cup play their club football in Europe.

This includes four from Brazil who play at Manchester City together. Manchester City have supplied the most players for this year’s tournament with 16 of their squad travelling to Russia.

World Cup's Europe club players

5. Hosts Russia are the lowest-ranked national team at this World Cup

The home team’s goal-scoring average at World Cups has been declining, and they are currently 70th in the FIFA/Coca-Cola World Rankings. Will a home advantage give Russia a competitive edge this time?

Russia goal record at World Cups

6. Iceland has a home advantage in Russia

Three of Iceland’s team play club football in Russia, as do three from Iran, two from Sweden and one from Poland. We gave proximity points to each player, based on their home club’s proximity to Russia. This helps us calculate a home advantage score for each team.

World Cup home advantage scores

What data would help us analyse the World Cup winner?

To build a model capable of accurately predicting the winner, we would need big data. The result of every match played by every team, not only at World Cup Finals, but also qualification games, friendlies and matches at other tournaments.

We would need to take into account things like location of those games, how far the teams travelled, how far into the season the games took place, perhaps the ranking of the opponents.

We would also need big data relating to all the players in all the games. Which players were in the games? How was their form at the time? How had those players performed historically before every match?

The location of a World Cup Finals tournament could also affect team performance – how far they travel to each match, how far they are away from home, how many days there are between matches.

Having every available data point would require some algorithmic analysis, and machine learning to develop models for predicting outcomes. We’d ideally test the results of our model’s predictions, so we could train it to get better over time.

Historic World Cup analysis – comparing the teams

We don’t have all that data, and we don’t have time to train a model to the extent we would like, but nevertheless we decided to do some analysis on a smaller scale and have some fun with data visualisation and insight.

We chose to focus only on historic matches played in World Cup Finals by the teams competing in this year’s World Cup. This means we have some dark horses. Iceland and Panama have never qualified before, so they have no history of matches played. Logic would suggest that a first timer is unlikely to win anyway – other than Uruguay, who won the first ever tournament in 1934 on home soil.

Here is the process we followed for this experiment

  1. We listed the results for each game played at previous World Cup Finals by teams competing in this year’s tournament. Iceland and Panama have not played in a World Cup Finals tournament prior to 2018.
  2. We listed all 23-man squads for each competing team, along with their caps and goals scored for their country, as well as their age and the club they play for.
  3. For each player, we calculated the average goals scored per cap, so that we could calculate this average for the whole team. This creates a unifying comparison metric for comparing team experience with goals scored over time.
  4. For each player, we created a proximity score based on how close to Russia they play club football. If they play in Russia, they get 20 points; if they play in a country neighbouring Russia, they get 15. For clubs within Europe or close to Russia’s time zone, it’s 10 points, for clubs much further away, like the Americas, it’s five points, while clubs round the other side of the world get two points.
  5. We also created a proximity score based on the continent that each player plays in.
  6. Proximity scores for all players are then added together to calculate a home advantage rating for each national team.
  7. Using these data sets, we then created multiple charts that let us view stats based on where players play, how much experience they have, the team’s history at previous finals and more.

See if you can predict the winner

Use our interactive report to see if you can predict a winner based on the available data.

World Cup powerbi report from Vertical Leap

 

George journey to becoming an SEO Specialist started as he became fascinated by trying to understand how people think in a digital capacity. His career in Digital Marketing started working in-house at various companies, studying different markets and working on all aspects including SEO, Social Media, PPC and strategy development.

More articles from
Back to top