Predicting the 2020 Ballon D\’Or Winner using historical data


By Ryan Reid

France Football’s decision not to award a 2020 Ballon d\’Or has come with much controversy. Many fans feel that 2020 standout Robert Lewandowski was robbed of the honour, while others debate Lionel Messi put up an admirable defence of the award. 

As we approach 2020’s The Best FIFA Football Awards, on the same day in which the Ballon d\’Or is typically announced, I aimed to determine which factors tend to define a Ballon d\’Or winner.

The objective is to create a model that effectively predicts who would\’ve won this year\’s Ballon d\’Or, while also looking at some of the most competitive and controversial Ballon d\’Or awards of past years.

Building the model

To effectively classify players into Ballon d\’Or winners amongst the best of the best competition, I used a combination of Gaussian Naïve Bayes classification and ranking models within R. 

The objective of using a Naïve Bayes model is to utilize posterior probabilities to calculate the likelihood of an event.

In using Naive Bayes to predict the Ballon d\’Or winners, I first attempted to utilize the classification method to determine if a player was more likely to win the Ballon d\’Or versus not when compared to other winners. 

The model was trained using the top ten finishers from 2010-2019, excluding the year that I wished to test the model. This process was repeated every year between 2010 and 2019 to narrow down the model inputs to the 20 most effective predictors of winning the Ballon d\’Or while also testing for and eliminating any variables that showed a strong correlation with other independent variables. 

Finally, the model ran on the newly inputted 2020 data to predict this year\’s winner amongst the 11 finalists for the FIFA \”The Best\” Award.

In years where the model classified multiple individuals or no individuals as likely to win the Ballon d\’Or, the model was re-run, this time using a ranking Naïve Bayes method rather than a classification one. 

The ranking model used the same input data as the classification model; however, the objective is to determine the probability of one player finishing ahead of another rather than their outright similarities with other winners.

The result of the model is a predicted ranking of the top 10 finishers each year.

The most important factors for Ballon d\’Or winners

The model uses 20 factors to predict the Ballon d\’Or winner. These factors include a combination of the following:

  • Individual honours (Team of the Year, Player of the Year, Best in Position)

  • Team success (International, Domestic, Champions League)

  • Statistics (Goals, Assists, Tackles)

Of these factors, five in particular continually emerged as having significant value in predicting the winners:

Champions League Golden Boot (90% of Winners vs 3.3% of Top 10 Finishers): Of the past 13 Ballon d\’Or trophies awarded, 12 won the Champions League Golden Boot. This trophy generally indicates excellent individual performance against high-level competition and team success within the Champions League, as you must progress far in the competition to win it.

UEFA Player of the Year (70% of Winners vs 3.3% of Top 10 Finishers): Receiving the UEFA Player of the Year award is usually an excellent start to winning the Ballon d\’Or. However, since this award so heavily prioritizes Champions League performance, there are still many instances in which the winner does not win the Ballon d\’Or. Examples include Andrés Iniesta in 2012, Franck Ribery in 2013, and Virgil Van Dijk in 2019.

Champions League Semi-Finalist (100% of Winners vs 48.9% of Top 10 Finishers): Interestingly enough, when I individually tested the importance of reaching each stage of the Champions League in determining the winner of the Ballon d\’Or, reaching the semi-finals appeared to have the most correlation with winning the award. Usually, this is likely enough to gain the media\’s attention while producing memorable performances against top competition.

UEFA Team of the Year (100% of Winners vs 46.7% of Top 10 Finishers): Once again, recognition from UEFA for performances is valued relatively high, indicating team success, individual success, and media recognition. Since its introduction in 2001, Michael Owen (2001) is the only winner to have not been named to the team.

League Top Player (70% of Winners vs 20% of Top 10 Finishers): Winning your domestic league\’s Top Player award typically indicates you were the best player on one of the best teams. While success in Europe is valued more, domestic competitions still hold value in determining the winner.

One interesting absence from this list is individual statistics, likely due to the award\’s nature, being media selected. Team success ensures widespread coverage of your performances, while individual awards indicate that you already hold influence in the minds of governing bodies, players, coaches, and the media. 

On the other hand, scoring goals (except in the Champions League) does not mean much due to the vast differences in competition levels across leagues, and scoring goals in domestic competition does not always result in team success.

Further, international competitions are not valued as high, likely due to the difficulty I had in weighting the Euro Cup, World Cup and Copa America competitions over less competitive AFCON and Asian Cup competitions, and the infrequency of events, which led to smaller sample sizes. If I was to improve the model in the future, this is one area I would explore.

Exploring past Ballon d\’Or winners



Arguably one of the most controversial Ballon d\’Or awards ever, 2018 saw Luka Modric awarded the trophy over Real Madrid teammate Cristiano Ronaldo, due to the essential role Modric played in Croatia\’s World Cup Finals appearance where he won the tournament\’s best player honours. 

Using 2018 as the test data, this was the only year in which the model’s predicted winner did not match the actual winner. Instead, the model predicted Ronaldo to win the Ballon d\’Or with 76% probability, versus Modric, who received a 7% chance.

We can attribute this Ronaldo\’s goal-scoring in the Champions League. However, this result could be in part due to the difficulty of measuring the importance of international competitions, as mentioned previously. The model still ranked both players first and second, explaining why the 2018 Ballon d\’Or remains such a controversial topic for discussion.



2013 was the year that saw the most individuals classified as Ballon d\’Or winners. Messi, Iniesta and Ronaldo all received over 50% probability of being classified as a winner by my model. While Messi received a 99.78% probability of winning, Iniesta and Ronaldo were close behind at 99.44% and 68.12%, respectively.

All three players featured in the UEFA and La Liga Teams of the season, and while Ronaldo won La Liga with Real Madrid, Messi ultimately won Player of the Year for the league. In the Champions League, both Real Madrid and Barcelona were eliminated in the semi-finals, with Messi winning the Golden Boot for the competition.

Iniesta, however, saw tremendous international success as Spain won the 2012 Euro Cup. He also took home Player of the Tournament and UEFA Player of the Year award for his performance.

Ultimately, Messi was given the nod for the award by both my model and by France Football.

Most deserving Ballon D’Or winners of past years

The following five players were given the highest probability of winning the Ballon d\’Or between 2010 and 2019 by my model.


The 2020 Ballon d\’Or Winner is…


Robert Lewandowski.

While only being classified as the Ballon d\’Or winner with a probability of 69% based on my classification model, he is far and away the most likely player to win the award, compared to the competition.

This season, Lewandowski’s Bayern Munich won the coveted treble (Champions League, Bundesliga, DFB-Pokal). Lewandowski was the top goal scorer in every competition, including an impressive 15 goals in the Champions League (in 10 games), which is the third-highest tally all time.

Additionally, Lewandowski more than attracted the media\’s attention, winning the Bundesliga Player of the Year and UEFA Men\’s Player of the Year. As his fans and teammates argued, he quite clearly was robbed of his first Ballon d\’Or honours with the award\’s cancellation.

However, with Lewandowski already qualifying for the Champions League\’s knockout stage, and currently sitting second in Bundesliga with 16 goals in 15 matches across all competitions, could 2021 be his year?

Honourable mentions go out to Lionel Messi and Neymar, who were the distant runners-up in the model\’s predictions.

Statistics retrieved from Transfermarkt and WhoScored

Cover photo credited to Getty Images, headshots retrieved from FUTBIN, FOX Sports

4 thoughts on “Predicting the 2020 Ballon D\’Or Winner using historical data”

  1. Marcus Sorvanis

    Hi, I’m doing a school maths project on predicting the ballen d’or winner, and was wondering how you weighted your variables, and applied your model to achieve your predictions? Email:

  2. Hello, I’m also making a project for predicting the Ballon D’Or winners with Python. I was wondering if you could share your project for me too. Or do you maybe have a Git repository with this Project?


Leave a Comment

Your email address will not be published. Required fields are marked *