Last year at Basketball Prospectus, I took statistical prognosticating to the next level and introduced a lineup-based prediction model for college basketball. The model predicted the tempo free stats of every D1 player, projected the lineup for every D1 team, and then added up the player stats to get a projection for every D1 team. Click here for a full description of the model including how I project each team’s defense. It is now time to use that model to project the 2013-14 season. These projections will be available on ESPN Insider tomorrow. But before I present this year’s projections, I want to discuss the improvements I made to the model this summer.
The biggest change I made to the model was adding a simulation component. While I can project ORtgs for each D1 player, the historical data tells us there is substantial variance in player performance. The simulation essentially takes a random draw for each D1 player (a good or bad outcome based on the uncertainty in that player’s ORtg), simulates each lineup, and then repeats the process 10,000 times. On Friday at ESPN, I will present the median simulation, best case simulations, and worst case simulations for each team.
There are a number of consequences to adding a simulation to the model. First, the simulation approach gives an advantage to teams with positional flexibility. For example, Louisville has two players Chris Jones and Terry Rozier who will likely compete to be the team’s starting point guard. Both players project as good, but not elite college point guards. But when you simulate the lineup, and realize that the better of the two players will start, suddenly the expectation is even higher. The winner of the competition is going to have a higher expectation than either player individually.
Contrast that to a team like Syracuse that has only one strong point guard option in Tyler Ennis. For Syracuse, the downside risk is much higher if Tyler Ennis struggles in his first season.
Superstar players also have a higher value in the simulation model. That is because it is much easier for the winner of a competition to be a role player. A team like Ole Miss may have a lot of roster turnover, but with Marshall Henderson in the fold, finding a passable lineup of complimentary starters is less of a challenge. (I don’t want to oversell this impact. Ole Miss only moves up one conference win with the simulation approach. But that one win could easily be the difference between the NCAA and NIT. And yes I am aware of Marshall Henderson will likely miss a few games at the beginning of the season.)
Another thing the simulation model exposes is what happens when teams have particularly short benches. Often a head coaching change will lead to some unanticipated transfers and leave a team with fewer than expected scholarship players. When teams have fewer scholarship players than normal, they have fewer options if someone doesn’t play well. The simulation model accounts for the impact of a short bench.
One of the most important things I learned in implementing the simulation is that all college players are unpredictable. All players have substantial variance in their offensive performance, regardless of what they did in previous years. Every season has a small sample size and every player is at the developmental stage of his career. No matter what someone has done, we cannot make a precise prediction about anyone. Take Doug McDermott as an example. While we know he will be one of the best players in college basketball, it would not be unusual for his ORtg to be 117 this year, and it would not be unusual for his ORtg to be 127 either.
Despite this uncertainty, there are several facts I learned by studying the historical data:
- Two-star recruits have more uncertainty than three-star recruits who have more uncertainty than elite recruits.
- Freshmen have meaningfully more uncertainty than upperclassmen.
- The more possessions a player plays at the college level, the less variance there is in his projection.
- The variance decreases for seniors even if they never play. If a player never cracks the rotation until his junior or senior year, his upside is significantly lower.
- JUCO Top 100 recruits have the highest variance of any player. These players are often very good, but are also often complete busts at the D1 level.
I also implemented a simulation on defense and here are a few facts that stand out after studying the historical data:
- The variation in defense is correlated with returning minutes. The fewer minutes a team returns, the more uncertainty there is about the team’s defense the following year.
- But while the number of returning minutes impacts the uncertainty, the effect is much smaller than you might expect. Almost any team can get worse on defense, even if most rotation players are back. Only if a team returns over 85 percent of its minutes from the year before is the team unlikely to get worse on defense. But even for teams that return over 85 percent of their minutes, some get substantially better while others tread water. Defense as a whole is very unpredictable.
- The variation depends a lot on the type of player that is lost. Losing a key shot-blocker results in more variation because the risk of the defense falling apart is much higher.
- While the sample size is too small to put too much faith in it, I have concluded that some long-tenured coaches (like Bill Self, Jim Boeheim, and Mike Brey) are substantially more consistent on defense than the average coach. In Bill Self and Jim Boeheim’s case that’s consistently dominant, while in Mike Brey’s case, that is consistently average. That doesn’t mean that these coaches don’t occasionally have good or bad years, but relative to the D1 average, these coaches have lower variance. Thus for coaches with over eight years with their team, I use a slightly different variance formula.
Previously I only included information on whether a recruit was ranked in the Top 100 out of high school. I now incorporate three additional measures that tell us about player potential.
1. I now incorporate the junior college rankings, the JUCO Top 100, in my player projections.
As I showed this summer, the JUCO rankings have predictive power. There is a lot of noise in those rankings, but there is also a strong signal of player quality. JUCO Top 10 recruits are particularly likely to be impact players. This is good news for Louisville fans. My model now has appropriately high expectations for this year’s top JUCO recruit Chris Jones.
2. I now incorporate the high school star ratings in my player projections.
In addition to their Top 100 lists, ESPN, Scout, and Rivals also provide evaluations for lower ranked recruits. Verbalcommits.com has compiled the star ratings from these three systems into a consensus metric and I now include this metric in my model.
As I showed in an analysis this summer, high school star ratings have significant predictive power. This is even true for veteran college starters because star ratings at least partially measure a player’s potential. (While the star ratings have some predictive power for juniors and seniors, it is worth noting that the explanatory power is lower for players who have played major minutes at the college level. If a player has well-established college stats, the star rating is less important.)
Last summer, I had concluded that the star ratings were not a high priority to include in the model. That was because freshmen outside the Top 50 nationally rarely make an impact in their first season. But star ratings are critical in at least two ways. First, while a three-star freshman is unlikely to have an immediate impact in a major conference, if a three-star recruit joins a team in a small conference, he is much more likely to be an important recruit for his program. The star rating information meaningfully improves the projections for many of the small conference teams.
Second, the hardest teams for us to project at the college level are teams with a lot of returning bench players who have not played yet. While the Top 100 rankings tell us which freshmen are most likely to have an impact, and the past college statistics tell us how returning rotation players will perform, for players on the bench there is not a lot of information with which to form an expectation. This year’s Temple team is a great example. Temple players like Quenton Decosey, Daniel Dingle, Devontae Watson, and Jimmy McDonnell sat on the bench on a talented NCAA tournament squad last year. Almost certainly two or three of those players will become key rotation players this year. But determining which ones will break out is very difficult given their lack of college statistics. While star ratings are not perfect, they do provide key information about the potential of bench players.
3. For two-star players, I now incorporate information on the number of offers and quality of offers a player receives.
Verbal Commits’ Paul Pettengill proposed a hypothesis about these players. Paul proposed that there is a measurable difference in the quality of these recruits based on the number and quality of offers they receive. And the historical data confirm his hypothesis. On average, a two-star recruit who attends Iona and received offers from Providence and Seton Hall performs slightly better than a two-star recruit whose only offer was from Iona.
This offer information is not particularly meaningful for recruits above the 2-star level. Essentially if the scouting services have rated a player in detail, their evaluations are far more important than the offer information. But for 2-star recruits, players for whom we have very little information on their potential quality, the offer data is important.
In addition to those major changes in the model, I also make a number of minor changes.
- I have improved the modeling of a player’s aggressiveness on the court, (the percentage of possessions used). As Ken Pomeroy noted here, when players transfer from a lower to a higher ranked program, they typically shoot less. This is now incorporated in the model.
- I made a special adjustment for Harvard’s defense. With Kyle Casey and Brandyn Curry suspended last year, the Harvard defense fell off a cliff. (Ironically they still won in the NCAA tournament.) With Casey and Curry expected back on the team, I based Harvard’s defensive projection on the change from 2012. Harvard still does not show up in the Top 25 of my model, but they are close, and I won’t argue with anyone who ranks them in the Top 25. On paper, this is clearly the best Harvard team in the modern era.
- Finally, I upgraded the expectations for this year’s elite recruits slightly. A substantial number of sources have said that this is the best high school class in recent memory and that Andrew Wiggins is the highest rated high school player since LeBron James. The projections have been tweaked to account for this.
Bottom Line: With better information on each player’s potential, and a new simulation approach, the model ‘s predictions should be even more informative in 2013-14.