I recently was building a simple model to forecast an NBA playoffs series. As I was building the model, I realized that I was not taking into account the possibility that each team’s strength could change over the course of the series. If a team dominates games 1 and 2, we might reasonably expect them to have a higher likelihood of winning game 3 than we did at the beginning of the series.
But perhaps we should not alter out initial belief too much. After all, the 2 games in the example above is not a very large sample. Sheer randomness and recency bias may be causing us to shift our thinking too much. My initial hunch was just that; the general public overreacts too much to a few performances.
To empirically answer this question, I compiled the scores of all 1090 playoff games from 2005 to 2017 using data from basketball-reference. First, as a baseline, I ran a linear regression (Model 1) to predict the result of each game using the regular season point differentials of the competing teams and the knowledge of which one is at home. To understand this regression (and the one that follows) here are the definitions I used:
Game.Result = home team points scored – away team points scored. (a negative value means the away team won)
Reg.Margin = home team regular season point differential – away team regular season point differential.
Essentially, we take from this model that home court advantage (the intercept term) is worth about 4.2 points and every point of regular season point differential (Reg.Margin) is worth about 1.1 points. As an aside, having home court in a playoff game is worth about a point more than having home court in a regular season game. Matching our intuition, Reg.Margin is highly significant. The R-Squared value of 0.1123 means that about 11% of the variation in playoff game results is explained by who is home and the regular season strengths of the teams playing.
Next, I devised a metric to measure how much a playoff series is deviating from expectation called Series.Strength.
Series.Strength = how well the current home team has performed in the previous games of the series, relative to our expectations. (A positive number means the current home team has played better than expected in previous series games and a negative number means they have played worse than expected.)
The full explanation of this metric is in the footnotes*, but you should basically think of Series.Strength as a measure of much the previous games in the series have deviated from what Model 1 would tell us to expect. Importantly, Series.Strength already prices in the regular season strengths of the teams and who was at home. The Pelicans beating a team by 10 is a larger deviation from regular series expectations than if the Warriors had beaten that same 10 by, so Series.Strength will be large in absolute value in the first case. Also, keep in mind that Series.Strength is measured from the current home team’s perspective because my models are predicting home margin of victory.
For context, Series.Strength values ranged from about -3.5 to +3.5 in the data, with half of all values between -0.6 and +0.6. Series.Strength is distributed almost normally with mean about 0 and standard deviation about 0.9.
To test the predictive power of Series.Strength in predicting future games, I did another linear regression. This time I predicted Game.Result using both Series.Strength and Reg.Margin (Model 2). I used a sample of all playoff games from 2005 to 2017 which were not game 1’s, a total of 895 games. The results are below:
The coefficient of 0.58 for Series.Strength means that for every unit a team has over-performed in the series so far, the margin of the next game will be, on average, 0.58 points more in their favor. But because 95% of the values for this variable have absolute value less than 1.75, the 0.58 coefficient estimate is actually not very large.
In fact, the most important thing to notice in this summary table is that Series.Strength has a p-value of 0.224. This means that its coefficient is not, statistically speaking, significantly different than 0. In other words, we do not have strong evidence that past games in an NBA playoff series are in general particularly useful for predicting the score of the next game, above and beyond what we already know from the regular season strengths of the teams. Moreover, the R-squared is almost exactly the same as Model 1.
Of course, I am not arguing that the first few games of a series are not useful for providing coaches and fans with lots of information that is useful in thinking about the next meeting. And certain matchups are more/less favorable for a particular team. It is simply that a few games in a series are not enough data to overcome the inherent randomness in a basketball game. If we had a 21 game series, it is entirely plausible that we could gather information from games 1-15 which would help us a lot in predicting game 16. But an NBA series provides us at most 7 data points. I am just pointing out that we should be careful not to overreact too much to this small sample and forget what we have learned about the teams over a much larger regular season.
Series.Strength is calculated in the following way:
- Identify which team is at home in the current game. (Example: Team B is at home against Team A in game 3 of the series)
- Take the residuals from Model 1 of the previous games in the series. Multiply any residual of a game in which the current home team was on the road by -1. Add these numbers together. This is how many total points our current home team has performed better/worse than expected in previous games. (Example: Team A won games 1 and 2 by 10 and 12 points, but the margin predicted by Model 1 was Team A +6 points. Then the residuals are 4 and 6, but Team B was the road team so we add together -4+(-6) = -10.)
- Divide the number from step 2 by sqrt( (12.8^2) * number of previous games in series). This number is the Series.Strength value for the next game. I do this because the standard error from Model 1 is 12.8, so we are normalizing raw points above/below expectation into a quantity which is distributed Normal(0,1). (Example: -10/ sqrt( (12.8^2) *2) = -0.55. So Team B has performed -0.55 standard deviations worse in the first two games than Model 1 would predict.)