Identify Sleepers in Fantasy Football using Statistics and Wisdom of the Crowd6
In this post, I demonstrate how to statistically identify sleepers in fantasy football using the wisdom of the crowd.
The R Scripts
The R Script for the “Wisdom of the Crowd” section is at:
The R Script for the “Experts” section is at:
What is a “Sleeper”?
In order to statistically identify sleepers, we must first define what we mean by a “sleeper”. Using the definition from NFL.com, a sleeper is “a late round pick…who exceeds his statistical expectations and becomes a prominent [fantasy player]”. Thus, we want to identify players who are likely to exceed their statistical expectations and have a breakout season.
You might wonder how much statistics can tell us about the likelihood a player could have a breakout season. A lot, I would argue. People tend to think of players’ projections in terms of a single value, namely the most likely value (e.g., the point estimate or average). For example, FantasyPros, which averages across sources of projections, provides one value of projected points for each player. Thinking in terms of a single value is bad because a) it suggests a higher level of accuracy and precision than actually exists, b) it falsely assumes that all players are equally predictable, and c) it ignores the fact that the players’ projections take the form of a distribution (not a single value). Consider the following figure:
In the density plot above, there are 3 players: A, B, and C. All three players have the same average projection: 150 points. That is, if you average across all sources, each player is considered most likely to score 150 points. This point estimate, however, ignores the different distributions for the different players. We see that Player A, with the narrowest distribution, is likely to score between 140-160 points, whereas Player B is likely to score between 120-180 points, and Player C with the widest distribution is likely to score between 70-230 points. We call these differences in the width of the distribution the “variability,” which can be quantified with the standard deviation. By thinking in terms of an interval estimate (range) rather than a point estimate (average), we can more accurately assess the likelihood that a player will exceed expectations and have a breakout season. In the example above, Player C would be most likely to be the sleeper because Player C has the highest potential upside (based on the highest standard deviation). Thus, we can quantify sleepers as those players with high variability in their projections across sources as measured by standard deviation. For more info, see here and here.
Wisdom of the Crowd
Adapted from work by Drew Conway (see here), the script takes 10,000 mock drafts from Fantasy Football Calculator, and computes a robust standard deviation for each player’s draft position. We compute a robust standard deviation (known as median absolute deviation) to make sure the variability estimate is not driven by outliers from a few crazy drafters. This gives us a sense of who the crowd thinks the riskiest players are. In other words, it gives us the wisdom of the crowd for which players are the most variable in terms of ranking. The riskiest players according to the wisdom of the crowd are labeled in the figure below:
We can similarly calculate a standard deviation across rankings and projections by experts. For info on how these are calculated, see here.
Combining Variability of Rankings and Projections
After calculating the variability of players’ rankings (crowd and experts) and projections (experts), we can combine them. In order to equally weight the variability of rankings and projections, I combined the two variability of rankings (crowd and experts) before averaging them with the variability of projections. To average, I first z-score standardize them to put them on the same mathematical metric (mean=0, SD=1). Then I average the variability of the crowd’s and the experts’ rankings to get an overall ranking variability. Then I average the standardized ranking variability with the standardized projection variability to get an overall risk variability. Then I rescaled the risk variable to have a mean of 5 and a standard deviation of 2. Players with risk values above 7 are thus greater than 1 SD above the mean in terms of variability.
Who are the Sleepers?
Here are some notable players who have high upside potential and are potential sleepers (it’s also worth noting that, by definition, they also have considerable downside potential, as well, so they are best drafted later in the draft as a low risk, high reward pick).
- Sam Bradford, QB, STL
- Doug Baldwin, WR, SEA
- Khiry Robinson, RB, NO
- Andrew Hawkins, WR, CLE
- Jace Amaro, TE, NYJ
- Devonta Freeman, RB, ATL
We can use statistics and wisdom of the crowd to understand which players are most likely to have breakout seasons (based on the variability around their rankings and projections).
I understand the sleeper concept but what does the number tell me? Is a higher number good or bad?
Isaac, I really like what you are doing with fantasy football statics and I would like to know how I can contribute to your project… I have a long history of web, application, software development, and can definitely help your research/cause.
Let me know if you are looking to add people to your team!
Thanks Doug. We are absolutely looking for help, see here:
The idea of identifying fantasy sleepers by the standard deviation in rankings is a-priori plausible and appealing. However, has anyone actually tested the data to make sure that the relationship actually does hold? I can think of some easy ways to test the relationship and could actually do it for you if you want, but I don’t want to reinvent the wheel if it’s already been done.
We haven’t had a chance to test it yet, but it’s on our to-do list. Let us know if you’d like to do an analysis and write up a guest post!