The State of Soccer Analytics
Relative to other major sports, soccer lags behind with regards to its acceptance of analytics within the game. Soccer is an extremely traditional sport that is usually reluctant to change, so this should not come as a huge surprise. While there are some that are ignoring, there are some that are using this as a competitive advantage – and it’s really working in some cases.
In a game as fluid as soccer, it is difficult to understand the game objectively amidst differing opinions from players, fans, coaching staff and the media alike. However, the recent growth of analytics in soccer provides an element of objectivity. It introduces new measures of predictability that encourage analysis, in an area where it is currently lacking.
Another reason that soccer analytics lags behind to the public eye, is due to the rarity and inaccessibility of the data. Not to mention the complexity and quantity of data required to fully capture value on an open-play sport with infinite game outcomes. The company that holds the monopoly on advanced soccer data is called Opta, and they track every game in every major soccer league around the world. Since there are a lot of games to cover worldwide, lots of things to track, and only a few groups doing it, it’s not hard to see why this data is easy to monopolize. As a result, this data is either difficult to scrape from the web, or too expensive for personal use as it is believed to be priced in the four digit range per year for a license for a single league’s worth of data, but obviously this varies by use and is not confirmed by Opta themselves. As a result, it is difficult, but not impossible, to practice public soccer data analysis.
There are still other ways though! Sites like WhoScored and Squawka offer simple game stats for teams and players, although they are not exportable with traditional methods. For MLS specifically, American Soccer Analysis offers many features to get your fix for advanced stats, which will be highlighted throughout the article. These concepts can be used as evaluation tools, to confirm the eye-test, or to just enhance the viewing experience of the game.
How Teams are Using Analytics
Although statistical analysis is not new to soccer – where pass counts, pass completions and shots taken, for example, are often recorded – such stats only provide information of certain events in the game, while lacking further insight. Soccer analytics helps identify and acquire insight regarding potential players’ performances based on previous data sources collected from past performances. These advancements enable coaches and managers to utilize this data to plan more effective training programs, team selections, and game strategies.
Analytics can be broken down into technical and physical categories. The physical aspects account for distance covered, intensity, number of accelerations and decelerations and jumps and lands. This data is most often utilized to monitor individual training loads which helps minimize injuries. The Seattle Sounders of Major League Soccer mainly focus on sports science along with physical analytics to ensure players are at their physical peaks and to prevent injuries
However, technical analytics act as a tool to help players and coaches to quantitatively assess individual and based team performances. This information is used to improve both individual and team performances and design successful strategies for upcoming games. These mechanisms can also provide knowledge to predict outcomes of games, create new game strategies, determine the price value of a player and connect players to brands and sponsorship opportunities. Devin Pleuler, Senior Manager of Analytics at Toronto Football Club, explains the importance of analytics in Major League Soccer “The players are on a salary cap but the analytics department is not so it’s a way you can set yourselves apart in a relatively cheap manner”. Analytics helps us quantify individual in-game events to provide an understanding of the probability of success, often evaluated by estimating goal scoring potential. It assigns values to the events – events being each stat category – to help better understand and coordinate tactics and systems. Coaches and managers can use this data to tailor tactical systems for upcoming games that are backed by objective information, translating to higher success rates on the field.
It\’s no surprise then, that in a game where analytics is finally starting to carve out a place for itself, that the two using it the most heavily in the MLS, have ended up in back-to-back MLS Cup finals against each other. Fun tidbit, when these two teams first competed in the MLS Cup finals, TFC\’s Senior Manager of Analytics challenged the Sounders\’ Director of Analytics, Ravi Ramineni, to a friendly wager:
— Devin Pleuler (@devinpleuler) December 5, 2016
No word on whether Devin actually gave up his calculator or not, as TFC did end up losing that round. If he did, perhaps he got it back the next year when TFC was victorious over the Sounders.
Expected Goals (xG)
The most popular and most cited advanced metric in soccer analytics is Expected Goals (xG). Generally, expected goals is the count of how many goals a player should have been expected to score on, based on the quality of their chances. There are many models attempting to capture this, some better than others, but none are perfect. The main two inputs that can be found in most, if not all xG models, is where the shot took place, and how the shot was taken.
The ‘where’ of the shot refers to both the distance and angle of the shot. Logically, it seems to make sense that the further away a player is from goal the less likely their shot is to result in a goal. This becomes reflected in this statistic as shots from distance generally have a lower xG than close ones. In American Soccer Analysis’s model, they consider how much of the goal mouth is available to shoot at. The closer a player is to the goal line the less goal mouth will be directly exposed to him. To compensate for that a sharper angle will result in a decrease in xG.
Determining how the shot was taken is a slightly more complicated, as it is composed of the manner in which the physical shot is taken, as well as the lead up play to the shot. Higher probabilities are awarded to shots taken with the player’s foot rather than the head. This is simply because statistically a shot taken with the foot is more likely to score than a header. The build up play before the shot will affect the xG rating. For example, a shot taken from 10 yards on a counter attack will be awarded a higher xG then the exact same shot resulting from a corner. The reason for this is a concept is due to the time and space that the player would be allowed. Typically, on a fast break a player has more space and is able to get off his preferred shot. Whereas with a corner, the eighteen-yard box is very clogged so players are rushed to shoot and the chance of the ball being deflected is much higher.
What Can xG Tell Us?
Reasonable conclusions that can be drawn from xG are how often a player is in a good spot to score, and makes themselves available for good chances. Comparing their expected goals to their actual goals will give you an indicator of a player’s finishing ability, and whether they’ve benefited from good or bad luck. Think of it this way, if a player misses a sitter in front of the net by skying it over the bar, this type of shot from that location could be expected at (making this up) 95%. This player’s goal count would be zero, but xG count would be 0.95. The player got into a good position to score, but performed weakly in finishing. If they kept this up, there would be a large gap and this player could be deemed a poor finisher.
On the other hand though, let’s say two players in two different games take the same shot (which is deemed to be a 50% shot, or a 0.5 xG) against two goalies that are standing in the same spot. One goalie dives across and makes an incredible save, while the other falls just short. The player who did not score is penalized in goals for unluckily going up against a better goalie, which is out of their control. Sometimes, factors that are out of player’s control can affect their xG count in the short-term, while normalizing closer to the real goal total in a larger sample where luck would not affect them as much.
On AmericanSoccerAnalysis.com, you can find constantly updated MLS xG counts by game, player, and team. On Twitter, @11tegen11 tweets out a game maps of xG that were accumulated by each team in the game, and gives the odds of each team winning based on their xG count. This is a great way to identify which teams really got the better chances, but ran into some bad luck or good goaltending. His charts typically look like this:
Each scoring chance is denoted by the bar moving higher. The larger the rise of the bar, the higher the xG of the scoring chance, which means the more likely they are to score. In this came, it can be seen that Jelsson Vargas scored on a ~0.1xG chance, meaning he would be expected to score on that chance once every ten tries. The final xG coutns were 1.27 for Montreal, and 0.96 for Toronto, leading to the conclusion that it was a fairly even game that could have gone either way. This can also be seen in the match odds near the top left (that looks like a France flag for this game). What these mean are that in games where one team put up ~1.27 xG, and the other put up ~0.96, the team with the higher xG would be expected to win 43% of the time, draw 30% of the time, and win 28% of the time. TFC can consider themselves slightly unlucky to come out of this game without a point.
Expected Assists (xA) and Key Passes
xG is the most common tool to analyze how dangerous an attacker is. However, it doesn’t take into account how effective a passer is. That is why the stat ‘expected assists’ or xA was created. Expected assists is designed to give credit to the player that creates a chance not just the player who takes the chance. The way it does this is by assigning the xG rating of the chance to the passer in the form of xA. Therefore, if a through ball leads to a chance with an xG rating of 0.4 the player who laid the pass would be assigned an xA rating of 0.4.
Adding on to the playmaking measurement is key passes. Key passes are defined as “the final pass or pass-cum-shot leading leading to the recipient of the ball shooting”. The beauty of this stat comes from its simplicity. As long as the receiving player shoots the ball the passer is awarded a key pass regardless of the result of the shot. Therefore, it is quite easy to track and look out for during a game and will give the viewer a decent sense of which players create chances. However, the simplicity of key passes are also their downfall. Because every key pass is awarded the same rating of 1 it does not account for the type of chance created. A three-yard pass leading to a shot that goes ten yards wide is worth the same amount as a through ball leading to a tap in. Unlike xA, key passes do not differentiate and are less effective at actually measuring the total effect of creativity of a passer.
Player Comparison (Radars)
One the most useful, and easy to interpret tools (mostly) available to the public community are player radars. Due to the data constraints outlined earlier, it’s not so easy for everyone to make them, but there are thankfully a few people on Twitter who post them on a consistent basis, and that has essentially created a database of them on there. Here’s an example of a player radar created by Ted Knutson (@mixedknuts), for Sebastian Giovinco in the 2016 season:
It might look like there’s a lot going on there, but it’s actually quite simple. Eleven stats are highlighted above, chosen by their position (in this case, forward). Each are presented in a per90 basis, so everyone is judged by the same scale. The closer each value stat is to the outer areas of the circle, is the closer that this player was to being the best in their respective league at it. The outer circle represents the top-5 percentile, while the middle of the circle represents the bottom-5 percentile for players in the same competition. If a player has a stat that touches the end, they are likely to be considered elite in that category. If they have a stat near the middle, this might be an indicator of their play style or they may have work to do. 0.39 throughballs has no relation to 1.2 dispossessions at all, aside from representing the same percentile rank for each different stat.
From this radar, we can see that Giovinco is an extremely high volume shooter, which is reflected in his high shots per 90, and low xG per shot. At first glance, his passing % looks weak, but considering that his passes into box number his well above average, he could be thought of as a creator near the goal. You probably already knew this, but the radar makes significant claims that Sebastian Giovinco is a fantastic soccer player, and has dominated the MLS. This really highlights the beauty of soccer analytics – it’s a great way to confirm the eye-test.
To access these player radars, it’s not an ideal process. First, go to The Twitter Search Page (does not require an account). The three people who have been identified that consistently post these are: @Mixedknuts, @Fussballradars, and @thefutebolist. Type any of their names (start with @Mixedknuts, his database is probably the largest, then move on to the other two) and then the name of the player you are looking for. It’s sometimes best to then filter by photos, as all the radars will appear there. You could then have found the radar you are looking for. If that didn’t produce any results, it’s not entirely hopeless. Ted Knutson occasionally opens a request line on Twitter, so if you want a radar for a player who does not have one yet, you can request one that way.
Score Effects are an important concept to consider, especially for casual viewing, as it might help explain certain phenomena that occur every single match. The idea here is that when teams are winning, they tend to sit back and defend more, and while they are losing, they push forward. Seems obvious, right? The thing that is not always obvious to most people is how this will affect the flow of the game, the final stat-line, and the quality of shots that can be expected. Statsbomb did a detailed statistical analysis on score effects which can be found here, which shows some of the math and stats they used to confirm this effect.
Essentially, what they found was that when teams were leading in a game, they tend to form a ‘defensive shell’ which will tighten them up defensively, and drop deeper. This is done because to them, preventing a goal would be more valuable than scoring another. They tend to allow more shots from a further distance out, and these shots typically are less likely to go in.
On the other hand, when teams are trailing by a goal, they will tend to take more shots in a more desperate attempt to score the tying goal. These shots will typically be of lesser quality due to this desperation and by not being afforded the freedom to wait for the perfect chance to become available. The conversion rates on these shots tend to be lower, which is another hat-nod to the notion that these shots are of lesser quality.
Add all of this up, and you could see a very lopsided statline at the end of the game if one team happened to be trailing for the most of it. It might paint a picture that one team dominated and got lucky. This could be true, but hopefully with knowledge of the concept of score effects, you will be able to see through this scoreline and consider that these shots could have been lower quality and part of the defending team’s plan all along.