Machine Learning Predicts FIFA World Cup Winner

The quadrennial FIFA World Cup has kicked off, and the sense of nationalism and competition has once again peaked on the global stage. Hosted by Russia, the 2018 World Cup showcases 32 of the world’s finest football playing teams. A week in, we have witnessed some historic, thrilling matches and unexpected results. Obviously, the potential winners of the prestigious World Cup are of significant interest.

One way to assess possible outcomes of the World Cup course and winner is to evaluate odds of statisticians. These researchers essentially analyze comprehensive databases of past tournament results in a way that quantifies the probability of varying outcomes of any possible match. When combining the odds of different analysts from different countries, a fairly clear prediction is produced on which country has highest chances of winning. This approach indicates that Brazil is the favorite to lift the 2018 World Cup title, with a probability of ~17%, followed by Germany and Spain, both at around ~13%.

However, engineers in Germany have recently developed a machine-learning system that they believe has the potential to surpass the accuracy of traditional statistical models in determining the favorability of various teams winning the World Cup. Dr. Andreas Groll, a professor of statistics at the Technical University of Dortmund (in Germany), and his colleagues used a combination of machine-learning and conventional statistics, a technique called the random-forest approach to determine a winner. And according to their technology, it’s not Brazil.

The random-forest approach has surfaced in recent years as an effective process to analyze extensive sets of data while avoiding the drawbacks of other data-mining methods. Simply put, this model determines the outcomes of random branches many times - each time evaluating a different set of randomly chosen branches. This technique has several key advantages, one of them being that it reveals which factors are most influential in determining the outcome; weighty factors (for the World Cup model specifically) that were inputted into Groll’s program were a nation’s GDP, FIFA’s international team rankings, home advantage, number of veteran/star players, etc. Groll and his team used exactly this approach to model the 2018 FIFA World Cup; the model picks out Spain as the most likely champion, with a probability of ~18%.


Nonetheless, the random-forest method allows for simulation of the entire tournament as well; and interestingly, this produces a different prediction of the winner. Groll and his colleagues simulated the whole tournament over 100,000 times and found Germany defending its World Cup title.

Chart of the Prediction

Groll remarks, “According to the most probable tournament course, instead of the Spanish the German team would win the World Cup.” Because of the countless number of permutations of games possible, the bracket above is still very unlikely.

This application of machine-learning and statistical modeling in sports is a prime example of how technology and engineering can have an extensive impact in the sports sector. Especially knowing that technology, if constantly developed, fixed, and perfected, may even be able to predict the bracket and winner of the FIFA World Cup with 99% accuracy in the coming decades.

The World Cup Final is on July 15, 2018

For more information about this topic, check out the following links:

Tagged in : Artifical IntelligenceAutomobile Technology

Omkar Bakshi Image

Omkar Bakshi

Omkar Bakshi is a junior at Cupertino High School. A tech enthusiast, he is always updated on the latest innovations and STEM advancements. He plans to study business, and open a tech-startup in the future. He is the NYTJ Director of Content.