Introducing QSAO's Analytics Mythbusters / by Guest User

By Constantine Maragos

For all levels of sports analysis, the stats are key. Whether you’re rambling on with your friends about why Marner isn’t overpaid or participating in a diplomatic discussion over Twitter as to why Fred Van Vleet should run The Raptors offence, the answers lie within the numbers. 

However, there are a lot of smoke and mirrors in analytics. As a sports fan, you do not want to be the person who bases their analysis on a player’s +/- or raw three-point shooting percentage. Looking deeper, many stats & metrics seem to paint a picture of an athlete’s play quite well, but in fact, miss the mark.

In this all-new series, QSAO and I aim to provide fans across all sports with the proper insight to conduct your own player & team evaluation and dominate every sports argument you find yourself in.

In the first issue of QSAO’s Analytics Mythbusters, I took the opportunity to look into stats from sports that I’m not necessarily familiar with, and explore different ways to measure player performance. I hope that the topics I cover provide value to those that are also looking to diversify their analytics portfolio.

Passer ratings in the NFL — Stay away

Screen Shot 2019-10-31 at 5.59.10 PM.png

For some NFL teams, finding a reliable quarterback is a perpetual struggle between overpaying backups and buying into draft prospects. Furthermore, finding a consistent passer, able to find receivers with precise throws seems even harder to find for some teams. This year, the NFL has seen many teams either lose their stars to injury, or struggle with choosing between the lesser of two (QB) evils. Apart from this, many fans like to compare their team’s QBs in debate using their QB Passer Rating. The NFL Passer Rating considers four categories: completions/attempts, passing yards/attempts. Touchdowns/attempts, and interceptions/attempts. The formula follows as:

NFL Passer Rating =

(((((Completion/Attempts)-0.3)*5)+(((Pass. Yd/Attempts)-3)*0.25)+((TD/Attempts)*20)+(2.375-((Int./Attempt)*25)))/6)*100

The categories themselves are pretty self-explanatory, however as seen in the formula, it is highly skewed. To start, counting completion percentage and yards/attempt as individual statistics is counterintuitive. As a player completes more passes, they will more likely than not gain more yards per attempt, whereas completion percentages are a broad categorization of passing effectiveness. A completed pass can mean many different things, and as Ahmed Cheema from The Spax eloquently puts it:

“Do you want a quarterback who completed 35 passes of 35 attempts for 100 yards or a quarterback who completed 17 passes of the same number of attempts but for a total of 300 yards? I’ll take the latter.”

Additionally, in this model quarterbacks are heavily penalized for interceptions far more than any other category. While there is a good reason to penalize interceptions at a higher rate, it has its drawbacks. To visualize, the Patriots defense this season is unstoppable. They are off to one of the best starts in NFL history, leading the league in interceptions, takeaways, yards allowed per game and points allowed per game. However, with that being said, according to the Passer Rating statistic, the likes of Sam Darnold and Baker Mayfield would have been better off not throwing the ball at all – literally.

“Opponents are statistically better off spiking the ball every play than passing on this Patriots defense 🤯 @brgridiron” - @bleacherreport on Instagram

“Opponents are statistically better off spiking the ball every play than passing on this Patriots defense 🤯 @brgridiron” - @bleacherreport on Instagram

While this is can be considered a testament to the Patriots’ historic defensive play this year, it is hard to give merit to a statistic that paints such a ludicrous picture. Passer Rating is widely criticized as incomplete, for not factoring in rushing plays, sacks, and fumbles, and is therefore unable to give an accurate representation of performance. The absence of important aspects of a quarterback’s game decreases its representative accuracy heavily. Take Vikings quarterback Kirk Cousins for example, who recorded a 99.7 passer rating last season, and is also ranked above the legendary Joe Montana all-time in passer rating. Last season, Kirk Cousins tied for the league lead in fumbles (7) with Derek Carr, threw 10 interceptions, and the Vikings themselves ranked 22nd in first downs per game. I don’t think those are the kind of numbers an all-time quarterback would post.

With that being said, look to use ESPN’s Total Quarterback Rating (QBR) when measuring quarterback successQBR looks to derive a total score for quarterbacks based on their contributions to a team win. 

The first metric QBR measures is the degree of success on a given play, estimated in expected points added. For example, a first-down into the red zone will net a higher gain than a first-down gain on your 40-yard line. Additionally, each play measures a division of credit amongst the team. It looks at each variable which contributed to a successful reception or runs. Play details are charted by ESPN Stats & Information analysts to provide an accurate representation of each play. ESPN then uses standard logistic regression to produce a number (between 0 and 100) to produce their total QBR.

Again, while no stat can be crowned perfect, QBR is a significant step up from the traditional NFL Passer Rating. Just remember to give me a shoutout when you’re schooling all of your buddies over beers at The Brass.

Analyzing defensive play in the NBA — What’s the best method?

Screen Shot 2019-10-31 at 9.33.03 PM.png

The NBA has entered a new dawn. As legends of years past like Dwyane Wade & Dirk Nowitzki are calling it a career, new stars such as 2019 MVP Giannis Antetokounmpo are taking over. While this is amazing for fans of the game, NBA defenses are left in the woods to fend for themselves. Much like in the world of basketball analytics, we see far more offensive performance measures than defensive. In a sport that is, in its simplest form, a “shooting” game, how can we quantify defensive play?

While traditionalists may say the eye test and player opinion are the only determinants of defensive abilities, what metrics are available to determine the best defenders on the court every night? While it is easy to point to stats such as steals and blocks, these statistics only cover a fraction of events on the defensive end.

For example, looking back to the 2019 Playoffs, the Los Angeles Clippers were unsuspectingly worthy opponents for the reigning champion Golden State Warriors. This was in no small part due to the contributions of Patrick Beverley, among others. Beverley’s gritty defensive play is a key component of his game. Looking at the stat-line, Beverley finished the series averaging a block and a steal per game, which is impressive, but does not see him stand out compared to the rest of the players in the series. So, in what other ways are we able to assess how Beverley played throughout this matchup?

The NBA has implemented player-tracking technology, where we can track defensive field goal percentages for players. From this, Beverley has the best defensive field goal percentage (47.1%) of the series (of those who played every game). This is especially impressive, given that he was tasked with covering Kevin Durant, who is one of, if not the most dangerous scorer in the game. On a grand scale however, player tracking is not available for all games, so where else can we go to measure defensive performance?

Defensive Win Shares are another metric that one could look to. Defensive Win Shares are based on Defensive Rating (an estimate of individual points allowed/100 def. possessions) to accredit players with estimated wins contributed from defensive play. The formulas to calculate Defensive Win Shares are as follows:

Marginal Defense/Player: (individual minutes played / team minutes played) * (team defensive possessions) * (1.08 * (league points per possession) - ((Defensive Rating) / 100))

Marginal Points/Win: Marginal points per win reduces to 0.32 * (league points per game) * ((team pace) / (league pace)).

Defensive Win Share: Marginal Defense / Marginal Points per win

With this formula, we are able accredit defensive contributions to team success, but we can still go further in measuring actual defensive performance on an individual level.

ABC Sports’ Nate Silver provides a new, fun alternative for analyzing defensive ability, through his Defensive Rating Accounting for Yielding Minimal Openness by Nearest Defender… or DRAYMOND, you can take a guess where the inspiration for that came from. The goal of this metric is the idea of minimizing openness, therefore rooting the statistic in shooting.

In brief, DRAYMOND uses the assumption that “open” shots are scored on average 8% higher than those taken against the average defender. To calculate the raw statistic, we subtract a given player’s shot defense per 100 possessions from the open shot percentage. Therefore, calculating the raw DRAYMOND stat, if a player was able to defend 46/100 2-point shots with a 56% open conversion rate, we’d get a score of (.56 - .46) x 100 x 2 = 20 points allowed from that defense.

Factoring in 3-point shots would also alter the model, and in fact is favoured quite well (for good reason, especially in today’s NBA). DRAYMOND also uses playoff and regular season data, so it adjusts for tougher matchups on average in the playoffs.

As DRAYMOND is considered a rate statistic, we divide Raw DRAYMOND by a player’s number of possessions on the floor. To adjust for position, we factor in the average shots defended per position i.e. PG/SG defend ~15/100 possessions, SF/100 ~16, and so on. This equalizes defensive value across positions, but as other defensive statistics (such as Real Plus Minus) bigs are usually favoured higher by nature.

Finally, we subtract the value of league-averaging shooting defense per possession from each score. Therefore, DRAYMOND essentially boils down to a plus minus statistic per 100 possessions, 0 being average.

With that, we are able to measure DRAYMOND defensive ratings based on opponents’ shooting data. Here is a picture of the best defenders since 2013-14 (minimum 10,000 possessions played).

Graphic provided by Nate Silver, data provided by GitHub

Graphic provided by Nate Silver, data provided by GitHub

Funny enough, based on the metric explained above, Draymond Green is leader in DRAYMOND since 2013-14, go figure. Player tracking statistics will always be, more likely than not, limited. However, being able to quantify defensive ability in any manner adds another layer of excitement to the game of basketball. For more information on DRAYMOND, be sure to check out Silver’s article here.

Quantifying performance in European football: OptaPro’s Possession Value Framework

Screen Shot 2019-10-31 at 7.02.42 PM.png

Yes, you read right – the esteemed analysts here at QSAO refer to “soccer” as European football. All jokes aside, as soccer is the ultimate team sport, it is interesting to look at different measurements of team performance.

The key to success in soccer is heavily rooted in controlling possession, more so than any other sport. However, assessing team play has seen its shortcomings. Previously, most soccer analyses are based on expected statistics, such as xGoals, xAssists, and xGoal Differential. However, improved analytics have provided us with many new statistics, such as key passes, big chances, interceptions, and ball recoveries.

While such statistics do articulate team & individual play to a certain degree, soccer analytics powerhouse OptaPro recently introduced their all-new Possession Value (PV) Framework.  

This new framework places a value on nearly every action on the pitch. While still in development, the results seen in its current stage are fascinating. The framework looks at up to five prior events in a possession and compares them to historical data to estimate the likelihood of a goal. When a player makes a play that increases the likelihood of a goal occurring, it is called Possession Value Added (PV+). PV+ marks a positive in-game contribution but does not affect xG or xA. Alternatively, negative in-game contributions such as turnovers in a defensive setting or squandered offensive opportunities, lead to a negative PV+ output. To compensate for players who find themselves with many offensive opportunities, the PV Framework caps negative offensive contributions at a PV+ of -0.025, the average value of possession.

Looking at the first seven games of the Premier League season, here are the top performers in PV+ from each team.

Graphic from OptaPro

Graphic from OptaPro

For those not as familiar with the Premier League, there are a variety of positions listed above. Now, if we were to look at the league leaders in xG and xA through the first seven games of the year, you do not see much of a variation in position.

Screen Shot 2019-10-31 at 7.05.34 PM.png
Graphics from UnderStat

Graphics from UnderStat

In both charts, the leaderboards are dominated by attacking players. Furthermore, while we can organize such statistics by position, measuring a holding midfielder by their offensive output would be a misinformed decision.

 With this in mind, Opta’s PV framework also allows us to pinpoint where a player is excelling on the field, and conversely where they are lacking. We are now able to identify where certain players perform above their Premier League counterparts.

Screen Shot 2019-10-31 at 7.09.00 PM.png

The only shortcoming from this framework is that it does not treat strikers too kindly, as they usually are not controlling possession very often, but rather looking to receive passes from their teammates pushing the ball forward. With that being said, the PV framework also allows us to detect which players are the best at facilitating possession up-field.

Screen Shot 2019-10-31 at 7.10.07 PM.png

The Possession Value Framework is taking the world of soccer analytics to new heights, and I am excited to see where it goes next.

Relative stats in the NHL — Missing half the story

Screen Shot 2019-10-31 at 7.10.53 PM.png

I saw a tweet recently talking about how easy it is to throw out relative statistics in hockey. Unfortunately, I am unable to dig it back up (I follow more people than I should on Twitter), but it essentially highlighted how easy it was to manipulate these statistics. I realized that not only have I been guilty of this in the past, but I now see the use of relative statistics all too much.

For those who do not know, a “relative” stat analyzes the impact a player has while they are playing versus how their team fares when they are off the ice. To articulate how these statistics are flawed, let’s take xGoals and Canucks captain Bo Horvat for example. 

The Canucks are off to a hot start this year, and Horvat has been a big part of that. Not only does Horvat play with a high defensive responsibility, but also drives a lot of the offense when he is on the ice. However, unlike his centre counterpart Elias Pettersson, he is yet to find consistent linemates this season. Because of this, his relative xG lists fourth worst on the team. Interestingly enough, excluding Horvat, four of the bottom five forwards (Michael Ferland, Tanner Pearson, Jake Virtanen, Josh Leivo) in this category make up the carousel of linemates that Horvat has had this season. Coincidence? I think not. 

Furthermore, a player like Horvat is susceptible to heavy situational bias. Horvat bears a heavy defensive load for the Canucks, and that certainly effects a player’s relative statistics on a given night. Of course, it is early in the season, but it goes to show how relative stats can misinterpret a player’s performance.

If you want to look at the level of individual play through the scope of relative statistics, you need to do your due diligence, if not ignore it altogether. Look into a player’s line deployment, matchups, and zone start percentage to scope out why a player is relatively underperforming on the surface.

Alternatively, there are a plethora of other options you can use to assess a player’s individual impact on their team. For example, individual Game Score is an excellent way to assess how a player fared on a given night. For an insightful summary of individual game score, check out Hockey Graph’s article on their website. As a summary, the statistics used in the formula for Game Score are goals, primary & secondary assists, SOG, blocked shots, penalty differential, faceoffs, EV Corsi, and EV Goal Differential. Game Score is not necessarily a new phenomenon, but it is useful for personal & research purposes, especially when the statistics included are easily accessible online.

Alternatively, another tool to assess player performance is Wins-Above-Replacement (WAR). A relatively new three-part series from online analyst Evolving Wild can be found on the Hockey Graphs platform as well. In essence, WAR attempts to quantify a player’s value through their on-ice contributions. Such articles have detailed information regarding its history, model formulation, and model testing. For those looking to perform their own research projects, I highly suggest reading these articles.

Special thanks to Hank Williams for his contributions to the NFL portion of this article.

Sources and useful links:

How to Calculate NFL Passer Rating (Medium)

Why Passer Rating is Broken (The Spax)

Kirk Cousins Is Not Better Than Joe Montana. So Let’s Fix Passer Rating (FiveThirtyEight)

How Total QBR is Calculated (ESPN)

A Quarterback Rating That Tries to Measure All the Little Things (NY Times)

Tracking NBA Defensive Impact (NBA)

NBA Defensive Win Shares (Basketball Reference)

How to Measure Individual Defense (Basketball Insiders)

A Better Way to Evaluate NBA Defense (FiveThirtyEight)

GitHub DRAYMOND Data

Introducing a Possession Value Framework

Possession Value — A Deeper Dive

Understat EPL Data

Hockey Analytics Provided by MoneyPuck