Though the NBA regular season is still 43 days away, it’s never too early to take a look at regular season win total over/unders!

Rather than building a projection system and delving into whether specific teams are more likely to go over or under, I instead decided to determine whether or not five key predictors could in any way help us make predictions in general. I will introduce the specific predictors in the next section.

To conduct my analysis, I looked at the over-under lines for all teams over the past six regular seasons. I was able to find a combination of lines from Bovada and the Westgate SuperBook, using the articles cited in the reference section. The dates for these lines ranged from October 5th to October 18, so they generally were recorded a week or two before the start of the regular season. All other information used in this article, mainly the statistics used to calculate my predictors, came from basketball-reference.com.

NBA over/unders (and betting markets in general) have always intrigued me. They tell us how the conventional wisdom rates teams, through a combination of statistical and more qualitative factors. The analysis I did, besides being potentially useful information for betting, also may tell us which types of teams are often overrated or underrated going into a season.

Before I continue, I should point out that this analysis needs to be interpreted through the lens that it is more common for a team to go UNDER than over. About 54.5% of teams went under over the last six seasons. This could be due to fans wanting to bet the over on their favorite teams or just general optimism among the general public. All results should be interpreted in this context.

## Predictors

There were 5 statistics (I call them predictors) whose relationship with season win total over/unders I was interested in. I will first define them and then talk about why I chose each predictor.

*Wins Above Pythag Expected =*regular season wins in previous season – expected wins in previous regular season based on point differential.*Expectation Jump =*over/under betting line – expected wins in previous regular season based on point differential.*Average Age*= average age of roster in previous regular season, weighted by minutes played.*3-Point % Allowed*= opponent 3-point percentage in previous regular season minus the mean team 3-point percentage and divided by the standard deviation for 3-point percentage. (a z-score, for those with a stats background)*FT % Allowed =*the same computation as for 3-Point % allowed, except using opponent free throw percentage.

We can think of Wins Above Pythag Expected as a measure of how ‘lucky’ or ‘unlucky’ a team was last season. I used the expected wins (sometimes called Pythagorean Record) from basketball-reference. Teams which win many close games will have a positive Wins Above Pythag Expected while the opposite is true for those which lose many close games. My hypothesis was that squads which outperformed their point differential in the previous season (i.e. have a positive Wins Above Pythag Expected) would be a little overrated by bettors and be more likely to go under. I thought the opposite might be true for those teams which underperformed their point differential.

I included Expectation Jump because I was interested to see how teams which were expected to make a leap in the standings, whether it be because of internal improvement, a trade, or free agent signing, actually did relative to that expectation. I did not have a strong intuition about whether this would be a meaningful predictor, but I guessed that maybe teams which are expected to make a leap (i.e. have a positive Expectation Jump) might be a little overrated by the public and more likely to go under. Average Age was another factor which I was simply curious to see if it would have any significance.

I was particularly intrigued by the two opponent shooting statistics and thought that they might be useful to bet on. I have written before, as have others, about how opponent 3-point percentage is probably largely out of the defense’s control over the long term. Thus I thought a team with a large, positive 3-Point % Allowed might have suffered from some bad luck in the previous season and be an over candidate (and vice-versa for a negative 3-Point % Allowed). I felt that 3-Point % Allowed probably would have a larger effect on over-under results in the next season than FT % Allowed, but the latter might too have some predictive power.

One final note before I dive into the results: for each predictor, I looked at four different categories of teams. I examined those who were above 0 in the statistic (or the average value, in the case of Average Age), below 0, in the top 25% of the statistic, and in the bottom 25%. Though these are somewhat arbitrary divisions, I choose these groupings to have a consistent way to look at the results and to see if there were any patterns among teams who were most ‘extreme’ in each predictor.

## Results

There were three ways I choose to evaluate each predictor. The latter two are closely related:

*Correlation With Wins Above Betting Line*= the (Pearson) correlation between the predictor and each team’s wins above betting line, which is simply regular season wins – the preseason over-under line. Correlation goes from -1 (perfectly anti-correlated) to 1 (perfectly correlated).*Percentage Over Betting Line*= the percentage of teams, in a given category, who went OVER their over/under line for the season.*P-Value*= the probability that a given category’s Percentage Over Betting Line would be equal to or more lopsided than it actually was, through random chance. (This is a one-sided p-value, for those familiar with the concept.) A smaller p-value (particularly less than 0.05) indicates a more surprising result.

To clarify how the P-Value is calculated, take the example of the Wins Above Pythag Expected, Greater than 0. There were 82 teams in this category, and 34.5 (42.1%) of those went over their betting line. (The 0.5 comes from one team that equaled its over/under line.) Overall, 45.5% of teams went over. Suppose that each team in this group is no different than the average team and actually had a 45.5% chance of going over. Then by sheer random chance, 26% of the time we would see less than or equal to 42.1% of the teams in this category going over. This makes the P-Value 0.26, as shown below. A *lower* P-Value, particularly one less than 0.05, indicates both a more surprising result and may give us greater reason to believe that a given predictor is useful for predicting over/unders.

*Remember, all results need to be interpreted through the lens that only about 45.5% of teams went OVER the betting line.* So, a Percentage Over Betting Line of 53% would actually be somewhat more surprising than 45%. This also means that there is more value to be gained by a stat which reliably predicts the under at an even higher frequency than the average rate.

Here, for your viewing pleasure, is the table of results:

## Take Aways

Before interpreting the results, I need to caution that combing through a bunch of p-values looking for significance can be a dangerous game. We are at risk of doing what is often called ‘p-hacking’, which basically means that if you test many hypotheses you will often see a few ‘significant’ (p-value less than 0.05, say) results that are nothing more than random noise. I looked at 20 different classes for significant behavior so, by chance, we on average would expect to see one significant group.

With this limitation in mind, let’s examine the results.

*The two categories of teams which most often went UNDER were the top 25% of older teams and the top 25% of teams with the largest jump in over-under line from the previous season’s expected wins. *This means that older teams may be slightly over-valued, perhaps because bettors underestimate age related regression or the risk of injury to older players. Also, perhaps, bettors are more confident than they should be that certain teams will make big improvements over the previous season’s result. Both these groups had p-values less than 0.05, a standard measure of statistical significance. For reference, the top 25% percentage of teams in Average Age were all over 27.9 and the top 25% percentage of teams in Expectation Jump were each expected to win 4.5 more games or more than their Pythagorean expected wins from the previous season.

FT % Allowed surprisingly had the highest correlation of 0.20, indicating that giving up a higher free throw percentage in the previous season was the most strongly correlated indicator with games won over the betting line. This result is surprising to me, and I would love to see if this holds over a larger sample of games. With that being said, no category for this predictor actually had a p-value of less than 0.05, though the bottom 25% was close. 3-Point % Allowed was, against my expectation, actually not a terribly useful predictor.

The correlations actually all were in the directions I expected, but their relative order (in absolute value) surprised me. FT % Allowed and Average Age had the ‘largest’ correlations, while Wins Above Pythag Expected and 3-Point % Allowed had the smallest correlations. This is actually the opposite of what I thought we would see.

I should point out that all these correlations are in the 0.09-0.20 range, which are not terribly large values. In many applications we would want to see correlations above the 0.50 or 0.60 level to really say any two variables are strongly correlated. However, in the betting market application, smaller correlations make sense. After all, if any correlation was very large then we might think that the market was miscalibrated in some way.

In general, as discussed before, under is a better play. A few groups of teams did go over more than 50% of the time, particularly those with higher FT % Allowed. But the best plays are those teams which are even more likely to go under than the base 54.5% rate.

## 2018-19 Season Best Bets

Because I could not help my help myself, I included a table below demonstrating how each of the 2018-19 teams stack up in each predictor. I would not advise betting solely based on Average Age or Expectation Jump, but it’s still informative to see which teams fit the patterns of typical under and over plays.

The over-under lines were from Bovada, as of August 20.

- Red values indicate predictor levels more closely associated with UNDER and green values are associated with OVER.
- Positive FT % Allowed and 3-Point % Allowed indicate opponents shot a HIGHER percentage. These values are z-scores, as explained in the Predictors section.
- The median average age of rosters over the past 6 seasons was 26.3 years old.

Based on the above table, I would take the Rockets, Warriors, Celtics, Mavericks and maybe Pacers and Wizards as decent UNDER bets for the 2018-19 season. I would probably not bet on any overs at even money odds (because under has been more common, as explained earlier) but, if forced to make a prediction, the Hornets, Hawks, and Jazz would be my best OVER bets.

I would love to gather more data and run this analysis again next season. In the meantime, we have only a month and a half to go until the games really begin!

## References

- Decoding NBA over/under win totals for 2012-13 season (Matt Moore, CBS)
- Give and Go: Evaluating over/under win totals for 2013-14 season (Ben Golliver and Rob Mahoney, Sports Illustrated)
- Evaluating over/under win totals for the 2014-15 NBA season (Ben Golliver, Sports Illustrated)
- Examining the best and worst bets for the 2015—16 NBA over/under lines (Ben Golliver, Sports Illustrated)
- The best and worst over/under bets for the 2016–17 NBA season (Ben Golliver, Sports Illustrated)
- Every NBA team’s 2017-18 over/under (Ben Fawkes, ESPN)
- 2018-19 NBA REGULAR-SEASON WIN TOTALS (Gilles Galant, Odds Shark)
- Data used to compute the predictors was from basketball-reference