With the 2018 NFL season underway, I decided to release some work on my first NFL project: a team rating and prediction system.
Motivating this project was a simple question. How much should we adjust our thinking about a team based on the result of 1 game? This has always been a question which has intrigued me.
We go into a new football season with some general expectations of the ability level of each team. Many of our expectations come from their past performance, but some new information is also added with offseason signings, trades, injuries, etc… Of course, there is a lot of uncertainty surrounding each team.
Then we witness week 1! Some teams look like underrated breakout candidates, and others flop. Hot takes abound. But how much information can 1 game really give us?
How The Rating System Works
To answer this question, I looked at games played from 2011 to 2017 (7 seasons), using pro-football-reference schedule data. To estimate the pre-season expected ability of each team, I used season win total over/under betting lines from here footballlocks.com. (For 2018 over/unders, I used oddsshark.com). I then built a very simple ratings and prediction system that relies on 4 parameters, which I will explain shortly:
pts_per_win_dif = 1.8
home_field_ad = 2.6
tuning_par = 0.04
sched_adjust = 0.5
Here is how the system works. At any given point in time, each team has a rating from 0 to 16. You can think of this rating as the number of wins a team with this ability level would be expected to win over the course of a 16 game regular season. As an example, say team A has an 8.6 rating and team B has a 7.1 rating.
Now, to calculate the expected score of a game where Team A hosts Team B, you simply have to know the pts_per_win_dif and home_field_ad parameters. In my final model, those parameters are 1.8 points and 2.6 points, respectively. Then you use the following formula:
Expected Margin of Victory of Team A (the home team) = pts_per_win_dif * (Team A rating – Team B rating) + home_field_ad
It’s that simple. So, in this example, Team A would be a 1.8 * (8.6 – 7.1) + 2.6 = 5.2 point favorite when playing at home against Team B.
The next thing my rating system does is go back after the game between Team A and Team B is played and adjust each team’s rating according to the result and the value of tuning_par. In my model tuning_par = 0.04. The way it makes the adjustment is simple. Suppose that Team B actually pulls off the upset and beats Team A by 7 points. Then, to get Team A’s new rating, do the following:
Team A New Rating = Team A Old Rating + tuning_par * (Actual Margin of Victory of Team A – Expected Margin of Victory of Team A)
So, in the example where Team B won by 7, Team A’s new rating is 8.6 + 0.04 * ( -7 – 5.2) = 8.112. Team B simply improves its rating by the same amount of points deducted from Team A’s rating, so Team B’s new rating is 7.1 + 0.488 = 7.588.
The last feature of my system is to assign each team a rating to begin the season. This is done by starting with each team’s pre-season win total over/under betting line and then adjusting this number based on the strength of their schedule. Basically, teams with tougher schedules see their rating boosted because it is assumed that their betting line is being dragged down by the tougher schedule. The reverse logic holds for teams with weaker schedules. To calculate a team’s rating before week 1 games, I use:
Pre-Week 1 Rating = Season Win Total Betting Line Over/Under + sched_adjust * strength of schedule z-score,
where strength of schedule z-score = mean win total over/under of all opponents – average (over all teams) mean win total over/under of all opponents / standard deviation of opponent mean win total over/unders.
The sched_adjust term in my model equals 0.5.
How Much Should We Adjust our Beliefs After 1 Game?
Going back to my original question, my model gives a pretty straightforward answer. It says to take what we thought would happen, count the difference between this score and what would actually happen, and multiply this by 0.04. So if a 7 win team does 10 points better than expected in a game, their new rating is 7.4. What’s really the difference between a 7.4 rating and a 7 rating? Well, based on the 1.8 pts_per_win_dif parameter, a 7 rating team would be a 1.8 point underdog against an 8 rating team on a neutral field. A 7.4 rating team is a 1.08 point underdog in the same game.
Another thing I wanted from my system was to translate the spread into a win probability for each team. To do this, I built 7 different models, each trained with 1 of the years missing. I then tested the optimal parameters from each model on the missing year and compute the average squared error between predicted score and actual score. Taking the square root of this quantity as an estimate of the standard deviation, and assuming normality about the predicted score, the model roughly predicts the final score of each game is normally distributed with mean = predicted score and standard deviation = 13.34. Using this assumption, you can get a predicted win probability.
I’m planning on doing a bit more work to see how I can make my model better. I am not taking into account injuries right now, which is especially important to do if a star quarterback gets injured. Moreover, it might make some sense to go back and adjust a team’s rating later in the season if the teams they played earlier in the season now look stronger or weaker than believed at the time. Also, perhaps most worryingly, I notice a persistent 1-2 point increase in mean squared error on test data as the season goes on. But this is all a work in progress! The main thing that this simple model does is offer some insight into my original question: how much should we adjust our beliefs about a team based on a single result?