The NBA draft is almost upon us! For the sake of curiosity, I decided to dig into the data to determine which college statistics have been most correlated with NBA success.
I will first give a few words on my methodology. I considered all players drafted in the year 2000 to the present whose age 25 season could have happened in 2017-18 or earlier. The age 25 season criteria is included because I measured player value in terms of Win Shares (a stat provided by basketball-reference) produced in the age 25 season. I chose to measure performance at this age rather than something like prime years (age 26-29) to allow more recent draft picks into the sample.
Win Shares, like a lot of one number summaries of player value, is by no means a perfect statistic. I have in the past found it to be a bit biased towards centers. But it will work for this analysis. Also, I will refer often to WS*, which is log(Age 25 Win Shares + 5). I transformed age 25 season Win Shares for the sake of having a better linear relationship with draft pick.
I ended up splitting the data into two sets: college guards and forwards, as designated by basketball-reference. The college position is an important point; many power forwards in college have recently moved into roles as small-ball centers in the NBA, but they were considered forwards in this analysis. There were only 80 centers, so I decided to leave them out for this analysis. The two datasets had, respectively, 242 total guards and 238 total forwards.
While it’s a well-known fact, it’s worth restating: average player value decreases as draft position increases (increases in the sense that the 10th pick is a higher pick than the 5th pick). The plot below demonstrates this:
A linear regression of WS* onto draft pick yields an R-squared value of about 0.24 for guards and 0.12 for forwards. These are not terribly high R-squared values, which we might expect. After all, there is a lot of variability in player performance at all draft pick ranges. But still, 0.24 is not terribly low. It is interesting to note that forwards are a bit harder to project based on draft position.
Given the existing relationship between draft position and WS*, I took the residuals of the regression of WS* onto draft pick and correlated these residuals with a bunch of other college statistics. I call the residuals of the first regression “Draft Pick Adjusted Value” because they represent how much a player has over/underperformed the expectation set by their draft position. Basically, the correlations between Draft Pick Adjusted Value and various college box score stats are telling us which college stats do a good job predicting NBA performance above our baseline expectation set by draft position. Here are the correlations for guards and forwards:
While no statistic is highly correlated with Draft Pick Adjusted WS* (the absolute value of all correlations are below 0.2), true shooting percentage (ts.pct) stands out as being positively predictive for both the guards and forwards. In fact, true shooting percentage has the largest correlation (in absolute value) of about 0.18 for guards and is second after free throw percentage at about 0.14 for forwards. Perhaps efficient college scorers are being undervalued slightly.
Steals per 40 minutes is also relatively predictive by this metric, as well as height and weight for guards and assists per 40 minutes for forwards. Height and weight are negatively correlated with Draft Pick Adjusted WS* for the guards, indicating that smaller guards have slightly outperformed their draft position. Points per 40 minutes is not highly correlated with Draft Position Adjusted WS* for either group. Age, though highly correlated with draft position, does not give us too much additional value after accounting for draft position. Free throw percentage is interestingly the most highly correlated stat for forwards, and this is not simply because free throw percentage is highly correlated with true shooting percentage (the correlation is about 0.28).
A natural next question is exactly how much more predictive power can we gain with these college stats, above and beyond that which we get already from draft position? We can use the R-squared values from WS* regressed onto draft pick as a baseline (0.24 for guards and 0.12 for forwards). Using a stepwise method of selecting variables, the R-squared was increased up to 0.32 for guards and 0.17 for forwards as other variables were added in. Basically, college statistics can give us a bit of insight beyond draft position but not a super large amount. Perhaps more advanced plus-minus based college stats can give us a bit more insight, though this is purely conjecture.
In the future, possibly before the draft, I plan to release a prediction model to predict NBA performance based on college (and perhaps Euro) statistics. But in the meantime, I am keeping an eye on true shooting percentage and steals per 40 minutes for college guards and forwards, as well as assists per 40 minutes and free throw percentage for forwards.
For fun, I also made a plot showing where of the top college prospects rank in terms of steals per 40 minutes and true shooting percentage. The percentiles are with respect to previous draft prospects with the same position group (forward or guard). Mikal Bridges stands out, as well as Khyri Thomas, Shai Gilgeous, Moritz Wagner and Trae Young.