Calculate Fantasy Players’ Risk Levels using R
24In prior posts, I have demonstrated how to download, calculate, and compare fantasy football projections from ESPN, CBS, and NFL.com. In this post, I will demonstrate how to calculate fantasy football players’ risk levels. Just like when determining the optimal financial portfolio, risk is important to consider when determining the optimal fantasy football team. Each player has a projected value (our best guess as to his projected points). We can think of the player’s mean or central tendency of various projections as his “value.” Nevertheless, the player’s mean projected points across various sources (e.g., ESPN, Yahoo) does not tell the whole story.
The R Script
Understanding Risk
Consider 3 players: Player A, Player B, and Player C. Each of the 3 players has the same number of projected points (150 points). The various sources of projections differ, however, in how consistent their projections are for the players. We will define consistency formally in terms of the dispersion of the projections for a given player in standard deviation units. The standard deviation tells us about the uncertainty of a player’s projections around his central tendency, with larger values representing more uncertainty. Sources were fairly consistent in projecting Player A, with a standard deviation of 5. Sources were less consistent in projecting Player B, with a standard deviation of 15. Sources were even more uncertain about Player C, with a standard deviation of 30.
This example is depicted below:
Each player has the same number of projected points, but each differs in the uncertainty around that estimate. We can consider the density plot (or the 95% confidence interval) as the range of plausible values for a player. So which player is best? The answer depends on whether you intend to draft the player as a starter or a bench player. If your intent is to draft a starter, your goal is to maximize value while minimizing risk. In other words, Player A would be the best (i.e., safest) starter. He may not score as many points as Players B or C, but you can be fairly confident that he will produce near his projected estimate. On the other hand, if you’ve already filled your starting lineup with players that maximize their value while minimizing their risk, the bench players serve a different role—the role of sleeper. Bench players do not score points, so it does no good to get a player with a low value and a low risk as a bench player. With bench players, it makes sense to take on more risk in the hopes that they will outperform your starters. Thus, if drafting a bench player, Player C would be best (i.e., gives you the highest potential ceiling).
Calculating Risk
We can calculate two forms of risk: 1) variation in projections and 2) variation in rankings. In the example above, we calculated variations in projections by calculating the standard deviation of sources of player projections. We can also calculation variation in rankings by calculating the standard deviation of sources of player rankings.
To calculate the variation in projections, we calculate the standard deviation of the players’ projections from ESPN, CBS, and NFL.com:
To calculate the variation in rankings, we calculate the standard deviation of the players’ rankings from experts and from “wisdom of the crowd.” We can get the consensus rankings from various so-called experts from FantasyPros.com, which averages rankings across various sources:
experts ("http://www.fantasypros.com/nfl/rankings/consensus-cheatsheets.php", stringsAsFactors = FALSE)$data experts$sdPick_experts as.numeric(experts[,"Std Dev"])
For an example of calculating variations in rankings from wisdom of the crowd, see here. We can get crowd estimates of rankings by using public mock draft information from FantasyFootballCalculator.com:
drafts ("http://fantasyfootballcalculator.com/adp.php?teams=10", stringsAsFactors = FALSE)$`NULL` drafts$sdPick_crowd as.numeric(drafts$Std.Dev)
Then, we can average together the variation in rankings from the experts and the crowd. After calculating the combined variations in rankings, we can standardize the variation in projections and the combined variations in rankings with a z-score. Standardizing both risk indices puts them on the same metric (mean = 0, standard deviation = 1), so that they can be averaged into a combined risk index. The risk indices were then rescaled to have a mean of 5 and a standard deviation of 2.
The 6 players with the highest risk on our combined risk index are the following (risk in parentheses):
Jake Locker (10.6)
Russell Wilson (12.4)
Kevin Smith (10.9)
John Skelton (9.9)
Matt Flynn (16.0)
Joe McKnight (10.3)
Note that these are not the highest risk players for next year because the projections came from last year (the projections for next year are not currently available). I will update my risk calculations when they become available.
Here is a density plot of the risk indices of all players:
In Conclusion
In conclusion, it is important to consider risk in addition to projected points scored. Risk is not intrinsically good or bad. In general, we should take less risk when drafting starters, but more risk when drafting bench players. In a future post, I will demonstrate how to use players’ values and risk levels to optimize your fantasy lineup.
Thank you for putting this info together. When I run the below script, I come across and error. I wonder if you know why that is?
qb <- projections[projections$pos=="QB",][order(projections[projections$pos=="QB",]$overallRank),] Error in order(projections[projections$pos == “QB”, ]$overallRank) :
argument 1 is not a vector
Thank you again!
Are you sure you loaded the data with the previous command (see below):
load(paste(getwd(),”/Data/Risk-2013.RData”, sep=””))
OR:
load(“INSERTPATHHERE/Risk-2013.RData”)
You can get the data here:
https://github.com/isaactpetersen/FantasyFootballAnalyticsR/blob/master/Data/Risk-2013.RData
Hope that helps!
-Isaac
I still don’t know how to use R can you help me?
See this post:
https://fantasyfootballanalytics.net/2014/06/learn-r.html
Hi-
First, thanks for the awesome work. The only thing that would have made it better is if you wrote all this in python, so that i could understand the scripts… but beggars cant be choosers.
I understand your view of player risk in this context. I think looking at the standard deviation likely accounts for risk of injury, loosing starting job, etc. However I have also long considered the risk of boom and bust weeks. It seems obvious that a player who puts up 20 points every week is more valuable than a player who puts up 40 one week and 0 the next, while their total season stats would look similar. Do you have any thoughts, or advise on attempting to estimate this?
Thanks Galen. There are many reasons why R is great (https://fantasyfootballanalytics.net/2014/01/why-r-is-better-than-excel.html). Python is solid too, but I don’t know the language. Anyone is welcome to translate these scripts to Python.
The variability we are estimating here is the variability of full-season projections across sources to help us understand each player’s range of plausible projected points. You’re talking about another kind of variability: game-to-game variability. This is easy to calculate after-the-fact (just calculate the standard deviation of weekly points for each player). Predicting week-to-week variability (as opposed to calculating it afterwards), on the other hand, is a more difficult task. One way would be to obtain/generate weekly predictions and calculate the week-to-week variability in the weekly projections. I don’t know of any source that provides weekly predictions for every week at the beginning of the season, however (usually sources provide weekly predictions only for the upcoming week).
Another option might be to identify the statistics that lend themselves to being less reliable (TDs) and those that are more reliable (yards) from week to week. For players with a higher percentage of their projected points scored by TDs, their weekly weekly points are likely to be more variable. Another example might be possession receivers vs deep threats. Possession receivers are likely more reliable from week-to-week than deep threats who are more boom-or-bust. You could quantify this by the number of receptions or yards per reception.
Hey Galen, I wrote a new post on this topic, and estimated players’ weekly variability using a simulation:
https://fantasyfootballanalytics.net/2014/07/weekly-variability-simulation.html
Awesome! I cant wait to take a look.
Hi Isaac,
Love the site and your work! Thanks for writing such clean, readable code. I am working on teaching myself R (coming from a Stata background), and while it has been a learning curve, being able to read through code like this has really taught me quite a bit.
Out of curiosity, why scale the risk to a mean of 5 and standard deviation of 2?
One other question as I am new to github as well: If I download a zip of your repo, it looks like all I need to do is run the Prepare Data.R script to update everything locally on my machine?
Hey Tom,
Thanks for the kind words. To answer your questions:
1) The raw risk metric is the average of 2 z-scores, so it has a mean near 0 and a SD near 1. This is an easy metric for statisticians to understand, but most people outside statistics don’t encounter this metric often. I wanted to create a metric that closely resembles a 0-10 metric, which most people find more intuitive. I could set the scale to have a range from 0 to 10, but the risk values are skewed, so a value of 3 could actually be above average risk. I chose a mean of 5 and a SD of 2 to more closely match the 0-10 metric, where values below 5 are below average risk and values above 5 are above average risk (with every SD away from 5 representing extremes — i.e., > 7 = high risk, > 9 = very high risk). Does that make sense? If you can think of a more intuitive metric, I’d be happy to consider it.
2) Theoretically, yes, you could run Prepare Data.R and it would do everything. Two cautions: 1) it won’t run successfully without modifying some of the scripts for your system (e.g., entering username/password for FootballGuys, installing packages first, making sure working directories are correct). 2) Some of the scripts take a long time to run (like the “wisdom of the crowd” script, which downloads 10,000 drafts from fantasyfootballcalculator.com). You can modify the script to be faster by taking fewer drafts, but by default it takes a long time.
Hope that helps!
-Isaac
Thanks for the response, Isaac.
That reasoning makes sense, I was just curious why you went with 5 and 2 rather than the traditional 0 and 1 (maybe I missed that in an explanation somewhere else though).
Ok, cool. I’ll look into the code a little more. As a follow up, if I fork your repo on github, it’ll copy it to my own account.. and then I can make changes to it for my uses, but it won’t “push” any updates you edit/add, correct? I’m interested in developing tiers using points projections/VOR via k means clustering. Also, the only league I am in so far this year is a dynasty league, so I am hoping to adapt some of your code to focus a bit more on rookies.
Two other comments after using the snake draft optimizer shiny app in a mock draft this morning (I thought it’d just include them here rather than make another comment, hope that’s alright). Is there a way to add the actual ADP as a column in results? The ADP diff statistic is fantastic, but when I am actually drafting, it would be helpful to see what the actual ADP of each player is. That way I can still look for the high value players, but not reach too far from them if I think they will fall until the next round.
It might also be helpful to have an “upside difference” metric in the results. The projected points and then upside are awesome, but when you start drafting bench players and targeting the high upside players, it might be helpful to have a column that tells you the point difference between projected and ceiling/upside. Does that make sense? I realize it is just simply math and you can do it on the fly relatively easily.. but when you are comparing players and have 30 seconds to make a pick, it would be helpful to already have that column there. But obviously, there is a tradeoff with too much information and make the results output too cluttered.
Tom
1) Yes, a forked repo won’t sync automatically. See here for more info on how to sync a forked repo:
https://help.github.com/articles/syncing-a-fork
http://stackoverflow.com/questions/7244321/how-to-update-github-forked-repository
If you add functionality (e.g., tiers analysis using k-means clustering or rookie analysis) or improve upon my code, please create a pull request in GitHub so I can update the repo with your code. That way, the community can take what we do and build on it. That’s one of the reasons I make my scripts freely available.
2) Will look into adding ADP to the snake draft app. The difference between projected points and upside is the SD of projected points, and is used in the calculation of risk. Risk includes the SD of points and rank. Upside is projected points + 1 SD of points. So would risk be sufficient? Risk is actually better than just the SD of points because it includes the SD of rank, as well. The SD of points is going to be larger, in general, for players with more projected points. The SD of rank evens that out.
Just added ADP to the draft tools.
Cheers!
-Isaac
Does this calculation include the injury risk calculated from the Sports Injury Predictor? It was asking me for a fee when I tried to look it up separately.
Hi John,
Yes, our risk variable includes injury risk from Sports Injury Predictor, as described here:
https://fantasyfootballanalytics.net/2013/09/win-your-fantasy-football-snake-draft.html
It is a subscription service, but we have a partnership with them.
-Isaac
I love this site. This will be my second year playing fantasy football and my second year using this website to get an advantage in my league.
Could you give me some advice on how to interpret risk levels when comparing players? How should I try to calculate the tradeoffs in points vs risk? I get that for top picks less risk is better, but there ultimately has to be some sort of ratio of how many points you should give up for a given decrease in risk.
Also, how should I evaluate risk compared to the ceiling and floor for a player? For example, take these two players’ data from the projection app:
Jamaal Charles, VOR: 106.49, Points: 234.47, Ceiling: 252.93, Floor: 213.03 Risk: 6.36
Antonio Brown, VOR: 102.5, Points: 221.42, Ceiling: 237.96, Floor: 206.34, Risk: 5.27
I drafted Brown over Charles because they have similar VOR, and fairly similar point totals, but Brown has less risk. However, Charles not only has a higher ceiling, but also a higher floor. Is the “risk” here just that Charles has more upside? How big of a factor is injury risk in calculating the risk statistic?
Also, do you have any thoughts on how to incorporate projection accuracy into risk calculations? Since WR projections tend to be more accurate than RB projections for top players, shouldn’t that somehow be included when calculating the risk of drafting a player?
Hi Ian,
This is an interesting idea. For various reasons, we standardize risk *within* position. It’s certainly mathematically possible to include positional accuracy (e.g., MASE) in risk estimates, but how to do so in ways that are intuitive is a challenge. Let me know if you have ideas.
Thanks,
Isaac
Hi Issac,
Great article. I’m trying to do some research on fantasy football analytics and I was wondering if you have done or know of anyone who’s done any research on fantasy football from a portfolio theory perspective?
I’m trying to write my senior thesis on a comparison of fantasy football and portfolio theory but having trouble finding any scholarly work on the topic.
Thanks!
Brad
Hi Brad,
Not exactly portfolio theory, but here’s a possibly relevant article:
https://fantasyfootballanalytics.net/2015/03/fantasy-football-is-like-stock-picking.html
Hope that helps,
Isaac
Hi Isaac,
I really like the concept of a risk calculation in determining a player’s value. I still struggle with how fantasy football analytics is performing this calculation in the Snake Draft Analyzer. I understand 3 concepts, but believe I am missing something or not connecting the dots. Let’s use David Johnson and LeVeon Bell as an example. The main concepts in the calc (as I understand them):
1. Risk is calculated by position
2. Higher Risk can be a result of injury history
3. Higher Risk can be a result of deviations in projections (wide ranges between sites)
As I look at David Johnson vs. LeVeon Bell (based on my league settings), I see a .6 increase in risk in David Johnson vs. LeVeon Bell…but qualitative data would tell me that LeVeon is a higher risk than FJ. Can you comment on the drivers of the increased risk in DJ vs LeVeon?
-JP
PS – data points i’m considering
1. Leveon suspended in 2015 and 2016
2. Leveon season ending injury in 2015
Hi Joel, there is no qualitative data in the risk calculation due to its inherent unreliability but here is more information on how the injury risk is factored into the calculation: https://fantasyfootballanalytics.net/2013/09/win-your-fantasy-football-snake-draft.html.
Right. Because they’re at the same position, they would only be able to differ in injury risk (#2) or variability of projections across sources (#3). We can’t publish injury risk data (#2) by themselves because they’re proprietary data from Sports Injury Predictor. But you can get estimates of the variability of projections across sources based on variables in our apps, including “sdPts” (the standard deviation of points across sources) and “Point Spread” (the difference between Floor and Ceiling for a player). That should give you a sense whether the difference more reflects variability of projections across sources, or (if not), injury risk.
Hi Isaac,
This seems to quantify uncertainty in the mean projection, more than risk volatility in fantasy points. I could see a scenario where all sites agree we expect a player to get 10 points this week. But a player may have a huge positive skewness that leads to a very volatile resulting fantasy performance. While uncertainty and volatility in performance are most likely positively correlated, do you know a more direct measure of volatility in performance that isn’t just capturing uncertainty in mean projections?