By: TA (@Clevta) / Chase (@Luckym4n_)
Introduction
Bill Parcells used to say that in the NFL, “you are what your record says you are”. Anyone who studies the game with numbers understands that this saying is far from the truth. Not all final records are created equal, and it is important to help decipher the context to which each team’s results accurately depict their future outcomes. A team’s final wins and losses record doesn’t tell the whole story and is not the best indicator by itself for a team’s future success. As football fans, we know that a lot of games are won or lost in the final minutes. The proverbial “game of inches” absolutely rings true in a sport with such a small sample of games in one season. Random turnovers, crazy bounces of the ball, a botched call by the referee or even your entire season riding on the leg of a field goal kicker can change the fortune of your season. If a team catches an inordinate number of breaks, or vice versa, it can determine playoff and potentially Super Bowl fates. Being on the wrong side of many of those close games can also get a coach fired.
As NFL analysts, we try our best to perform autopsies on last season’s results and how teams got to those results, in order to try and predict the next season. For example, team record in one score games has long been used by bettors to identify those teams who could be due for either positive or negative regression. The thought has been that teams that produce extreme outcomes in those games should typically regress to near 50% the following season, making those outcomes unstable from year to year. In fact, since 1990, 81 teams produced a close game win percentage of at least 80% in a season, and in aggregate those teams followed that up with a win percentage of 45% the following season in those same close games. The 2020 Buffalo Bills went 5-1 in games decided by one score. Without any other context, one would suggest that through natural regression, the Bills wouldn’t expect to produce such outsized results in those close games again. And they were right. The 2021 Bills went 0-5 in the regular season and made it 0-6 with the heartbreaking loss to the Chiefs in the playoffs.
Methodology
We decided to look at team win probabilities in the 4th quarter, between the 12- and 6-minute marks remaining in the game. We wanted to give enough time within the final quarter to ensure that each team would have likely received at least one possession each in the 4th. In 2021, the average NFL drive last season was 2.5 minutes with the team possessing the ball the most, Green Bay, averaging 3.3 minutes per drive. We determined that using a blended win probability between the 12- and 6-minute marks was appropriate.
The hypothesis being that the average win probability in this time period is more representative of the wins a team “should have”, or likely to have achieved, and thus more predictive of team wins in the next season. Technically there is a cap to these win probabilities (100%) and piling on with late scores after a team is essentially guaranteed to win, helps improve some of the issues with point differential. We then took the difference between the blended 4th Quarter Win Probabilities for each team in each game of the season add them all up and subtracted their regular season win percentage to determine the 4th Quarter Win % Over Expected (results below).
Using nflFastR game data, from 1999-2021, we examined the relationship in predictably using our 4th Quarter Win % Over Expected versus the results of Pythagorean Wins. We discovered that our metric did yield a .02 R2 improvement (.22 vs .20) over Pythag, a modest yet competitive advantage with a large data set. From a betting standpoint we were curious what edge something like this could bring for us. Seeing as the relationship was slightly higher than with Pythagorean, we wanted to see if the outlier edges of the 4th Quarter Win% Over Expected would yield positive results. Taking a look at the top and bottom quintile of our data set, we found that the teams that “overachieved” our metric in the 80th percentile or higher, blindly went under their Vegas (DraftKings) win total the following season at a stunning 63% rate (63-37-5). Deciding that a find of 63% of something in the betting space is nothing to gloss over we took a look by how many wins were these teams failing to meet expectation.
What we found was that these teams that overachieved the prior season the most, didn’t just barely go under their win totals, they did so by a large amount. Teams that went under in the top quintile, did so by an average of 2.6 games and 2.1 when using the median figure. We figured it could be more profitable if we used alternative lines instead of the standard win totals offered at sportsbooks. Many books offer these alternative win totals for both the over and under, typically by 1 game extra each way. They also do this at very attractive odds (typically ~+180 to go under by 1 more game than the normal win total offered). For example, a team’s normal win total might be 9 with -120 odds to go under but would then also offer 8 at +180 to go under as well. Would it be more profitable to use alternative lines for juiced payouts taking the risk that, when these teams underperform, they do so by more than 2 games? That then got us thinking that maybe taking it one step further and betting these teams to miss the playoffs could offer up even juicier odds to tack on.
To test this hypothesis in the betting world, it is not enough to look back today at past data to determine the proper representation of that top quintile. We must use a methodology that could perform in real time. Meaning, back 10 years ago, we had no idea what was considered the 80th percentile since we only know it today after the seasons have already occurred. We would be cherry picking if we did that. In order to do this properly, we must apply this method of gathering 80th percentile data, one year at a time to collect our results. For example, in 2018 we would not have the proper data to determine that the 2020 Chiefs or 2019 Eagles would eventually end up in our top quintile results, as we do now in the overall dataset. So, our 80th percentile results would have a slightly different make up back then. Extrapolating this principle, we also needed to determine that this process would have made sense back when we started in 2006. One year at a time, we analyzed the data starting in 1999 and ending in 2005, giving us roughly 6 years of data to create our quintiles which would allow us to be “time travelers”. For those who are privy to the term, this is essentially time series cross validation. The basic idea is to avoid the common trap of using data you trained a “model” on later to also then validate the model. This is flawed and will lead more often to misleading assessments of accuracy.
The Biggest Overachievers- Historical Returns vs Pythagorean Wins
The table above shows how this methodology would have performed in terms of units per season, starting in 2006. You can interpret this as risking a 1-unit wager on every alternative under at +180 (assumed odds for every alternative line since historical data is non-existent) and 1 unit on each team to miss the playoffs that season. Doing this produced excellent results. Betting this basket of teams (105 team sample size) to underperform the next season would have profited +47.0 total units and a Return on Investment (ROI) of 41.3%. Performing the same exact process for the 80th percentile of overachievers in the Pythagorean basket (186 team sample size), would have returned +43.9 units and a 23.6% ROI. Very good results as well but underperforming our methods both from a unit win perspective and ROI considering a much larger sample and dollar layout.
What is impressive about our results is the consistency of the return. There are only three seasons out of 16 with negative results (2011, 2014 and 2020) for our biggest overachieving 4th Quarter Win % Over Expected basket of teams. That compares to four negative seasons for the Pythag basket of teams. In addition, the teams that fit our methodology produced outsized returns via going under the alternative lines (+29 units) versus +21.4 units for Pythag. To its credit the teams to miss the Playoffs in the Pythag basket did outperform.
As with any model, I believe that there should be some level of qualitative overlay to help produce the best results. For example, Peyton Manning led teams went 4-1 to the over as part of the overachieving basket while Tom Brady’s Patriot teams went 5-2 to the over. Nine of the win total and miss the playoff losses came at the hands of just these two QBs alone. In addition, there were samples, albeit limited, of teams that had major roster changes or advanced data similar to Pythag that led to the market pricing in a significant drop in win totals. The “bad news” or regression was priced into the market win totals to the point that they actually became favorites to miss the playoffs. The value in wagering to miss the playoffs in an already depressed market likely isn’t the most ideal scenario. Finally, although the odds are high for the best teams to miss the playoffs, we have noticed that every team who had odds of -700 or higher to make the playoffs since 2008 eventually did make the playoffs that season. It’s fairly rare for the best of the best to miss the playoffs.
It should be noted that we did not use the Kelly Criterion method of placing wagers as we have not seen any strong correlation between the outsized value of the overachieving win probabilities and results. In other words, there was no stronger “edge” by the teams at the very top of the basket of 80th percentile overachievers. Instead, a flat 1-unit wager was used for the overall basket of teams.
The Biggest Underachievers:
Has Identified Multiple Undervalued SB Teams
The underachievers didn’t perform as well overall but has still yielded a very solid 14.1% ROI on +15.7 units since 2006. Up until 2019, this basket of underachievers were on the same torrid pace with the overachievers, but have succumb to negative returns in the last three seasons (2019-2021). My guess is without any sort of qualitative overlay, some of these bad teams don’t hit their alternative overs as much due to a lack of quality QB play and the likelihood of constant roster turnover. Nonetheless, where this basket has become the most enticing, is in the ultimate upside and the long tail outcomes for potential Super Bowl sleepers. Take a look at the most recent results of teams that fit this underachiever basket and we can see some interesting outputs.
2021 -> ATL, CAR, HOU, CIN (SB Appearance), LAC, NYJ, SF
2020 -> CIN, DET, LAC, TB (SB Winner)
2019-> ATL, JAX, NYG, NYJ, SF (SB Appearance)
2018 -> BAL, CLE, HOU, IND, NYG, NYJ, TB
2017 ->LAC, CAR, JAX, LAR, ATL, PHI (SB Winner), CHI, CLE, CIN, ARI
4 out of the last 5 years, this basket of teams has featured a Superbowl representative (CIN in 2021, TB in 2020, SF in 2019 and PHI in 2017) including two of the winners. Also in 2018, Houston and Indianapolis won 10 games, following a 4-12 record the prior season, and made playoff appearances. While Tampa (2020), San Francisco (2019) and Philadelphia (2017) also appeared in the Pythagorean basket of underachieving teams, the Bengals (2021) only appeared in our 4th Quarter Win % Over Expected basket. These four teams that reached the Super Bowl, came into the season with odds to win their respective conferences at 80-1 (Bengals), 5-1 (Bucs), Niners (20-1) and Eagles (20-1). Not only that but out of 103 teams in this sample, 26 made the playoffs in that season after missing it the prior season. This dataset seems to have potential to identify longshot playoff and Super Bowl contenders but from a consistency standpoint, isn’t as desirable as the overachievers.