Drafting the Best Starting Lineup in Fantasy Football by Taking into Account Uncertainty in the Projections: An Optimization Simulation46
In a previous post, I showed how to determine the best starting lineup to draft using an optimizer tool. The optimizer identifies the players that maximize your projected points within your risk tolerance. The optimizer does not take into account the uncertainty of players’ projections (except by excluding players above your risk tolerance). This post demonstrates how to run an optimization simulation that identifies that best possible starting lineup by taking into account players’ uncertainty or risk.
Why relying on a single projection estimate is bad
One limitation of relying on a single projection estimate (a point estimate), is that it assumes that the projections are equally accurate for all players. This is a false assumption. Some players are more difficult to predict than others. Some players have a “high upside” whereas others are pretty reliable. These differences in predictability are not captured by a point estimate. They can be captured, however, by an interval estimate (e.g., a range of values or confidence interval). Nate Silver has written on the many advantages of using interval estimates rather than point estimates (see his book here).
How can we find the optimal starting lineup using interval estimates of players’ projections?
The first thing we have to do is to calculate an interval estimate for each player. This is the range of likely values for each player. We can construct an interval estimate from two parameters: 1) mean and 2) standard deviation. The mean is the average of all projections for a player. In other words, the projection from FantasyPros, which averages numerous sources of projections, can be our mean. For the standard deviation, we have to calculate the variability around the projections from the various sources for each player. For how we calculate the standard deviation of players’ projections, see here. Now that we have the mean and standard deviation for each player, we can construct each players’ distribution of possible points by drawing random values from a normal distribution with the same mean and standard deviation using the rnorm() function.
Here are the distributions of 3 “made up” players (I can’t unambiguously say “fantasy players”) with the same mean and different standard deviations:
How the Optimization Simulation Works
The optimization simulation works by selecting a random value for each player within his distribution, optimizing the best team, and iterating this many, many times. We can then see how many times each player makes the best lineup. In the following example, the simulation iterates 100,000 times. This gives us a fairly reliable estimate of each player’s projection distribution, and, as a result, the likelihood that the player is on the best starting lineup.
The R Script
The R script for simulating the optimization to take into account uncertainty in players’ projections is located here:
Here’s the simulation syntax:
In summary, relying on a single projection value is bad because it assumes that all players have equal uncertainty/risk in their projections. We can get a better estimate of each player’s likelihood that they are on the best possible starting lineup by taking into account the uncertainty around their projections. To achieve this, we simulate the optimization many times by drawing random values within each player’s distribution of likely point values. By doing this, we can get a better idea of who are the best players to draft.
I still don’t understand R can you help me?
can you recommend me a video for learning R?
See this post:
Is this the same as bootstrapping?
Yes, this is essentially bootstrapping — identifying the most accurate parameter estimate (likelihood that a player is selected on the best possible team) by sampling from a distribution.
Is there currently a Shiny App to build a league’s scoring into this model, similar to what is available with the Optimizer tool?
We don’t have a Shiny app that runs this simulation because it takes a long time to run and would be too resource intensive for a Shiny app. If you want to use this strategy, I’d recommend using the R script:
Thanks for your work. I’m someone who is just recently starting to understand the power of programs like yours, but am very inexperienced in using it. I’m an amateur in applying this sort of information, so please bare with me. If I’m using the snake draft optimizer app, does the “risk” number on that page account for the things you mentioned in this post? (Ex: Player uncertainty). I guess my real question is: If I use the draft optimizer tool alone, and account for the VOR/risk columns etc, will that be enough to give me an edge? Or am I missing something extra that I need to start applying?
I have been a rankings/single projections bum in the past, so I’m excited to have discovered your site.
BW: The draft optimizer tool should provide you the information you need to make decisions on draft day in terms of what players to pick. What the tool doesn’t take into account are positional needs, coverage for bye weeks, need for backups etc. So the tool may have a RB ranked as the top pick at a point in the draft when you may actually be better off drafting a QB, WR or TE because you have already met your needs at RB.
Hope that helps,
For more info on the tool and how risk is calculated, see here:
Hope that helps,
Have you looked at how the top 5 RB (for example) fared the next year (over a sample size of say, ten years)? And the 6-10 range, 11-20 range, etc.? Also look at where the top 5 fared on average the previous year and in projections. Wisdom of the crowd is the best projection, but it still overestimates how many the top 5, 6-10 (mostly), and 11-20 will at least repeat in that ranking group in the current year. For example, only 1 of the 6-10 ranked running backs in 2012 ended inside the top 10 again in 2013, yet even fantasy football analytics projected that 4 of them would (and the fifth only dropped to 13th). Take a look at a larger sample size and you will see that projections consistently overestimate the top players. Is there a way to incorporate that into projections to get more accuracy?
P.S. I did a project for my AP Statistics class on standard deviation to compare how certain players compared to their position to form an overall position projection. I really appreciate all of the work you have put into this, I love it!
You raise an interesting possibility that could reflect regression to the mean. We have noted over-estimation by about 5–6 points, on average, of past years’ projections (http://fantasyfootballanalytics.net/2015/07/accuracy-of-fantasy-football-projections-interactive-scatterplot-in-r.html). You can use our tools to examine historical accuracy and under/over-estimation (based on mean error). We will be tracking this closely to see whether this continues and for whom (e.g., top-ranked players, specific positions, etc.). If so, as you noted, we could account for this systematic over-estimation by subtracting projected points from those players.
I’m just curious to how you guys use this in a actual draft? Things are happening so quick that it is difficult to find the players that are being drafted and make sure you are not overpaying. I’m sure I’m missing something but it does not seem totally practical. But it is no doubt a great product.
I’m not sure I understand what you’re asking. You aim to draft the players projected to score the most points/floor/ceiling within your risk tolerance. Set your risk tolerance, and then it will tell you who to draft and how much you should be willing to pay for them. If those players are drafted, the “optimal lineups” will update automatically.
Hope that helps,
Ding! That would be the sound of the light bulb coming on. I figured out what I was doing wrong. The only question I have now is how do I change the size of the teams in my league? I have used two different browsers and the 10 is greyed out in both. Thanks.
Just enter a new value in the box.
In the optimal lineup tool you have the ‘cost’ output which puts a premium above AAV on players depending on the number of teams in the league, however, in the downloadable rankings tool this does not exist. I am looking to create several iterations of optimal mock teams but would like the ability to utilize ‘cost’. Can you please add this feature to the downloadable projections please? Thanks!
We’ll add this to our to-do list.
I really appreciate your comments here. A couple of years ago I tried to incorporate some of what you are addressing here – both mean projections as well as risk – into my evaluation of players but could not figure out how to combine the two metrics so thank you.
One thing I’ve noticed though is that some of the prognosticators player projections incorporate the risk of injury into a players projection, especially if there is a history – ie they don’t believe a player will stay healthy the whole season so they lower their projected totals, which increases the standard deviation of the projections. But what the projection or standard deviation doesn’t incorporate in these cases is that if a player does in fact get injured, there are alternatives like playing a bench player or picking a replacement up on the waiver wire to increase the actual points scored if an injury does happen. Doesn’t this make a player who has this kind of injury uncertainty even more valuable than what their projections suggest because you can sub in a replacement when they are injured? I don’t know if there is a way to quantify this statistically (?) but it seems this this would under-estimate the value vs the projected totals of more injury-prone players (like a Rob Gronkowski after his injuries in 2012 and 2013)…..
Yes, we’re working on an option to impute replacement-level points for players projected to play fewer than 16 games because of injury, suspensions, etc. Stay tuned! I still wouldn’t want one of my key starters to have a high injury risk though…
Thanks so much for this tool! I just found it this week and it’s already been helping me prepare. I have a question regarding the projected costs. I think I understand that you begin with average draft value and then apply a 10% premium (or discount) for top/bottom tier players. My question is with regard to how you select the starting $ value. My league is a 10 team, $250, PPR on ESPN, but ESPN (as far as I know) doesn’t publish ADV’s for custom leagues. So where do the average values come from? In other words, if I select ESPN on the custom settings in the analyzer, does it grab the default standard $200 draft values and then apply some sort of formula to adjust for the PPR and salary cap? Thanks! Eric
Good question. In your case, it would pull ESPN’s AAV based on a $200 league cap and adjust it according to your $250 league cap. It also would adjust by the number of teams in your league (http://fantasyfootballanalytics.net/2015/08/how-do-auction-values-differ-by-the-number-of-teams-in-your-league.html), but you have a 10-team league (the same as ESPN AAV is based on), so it doesn’t need to. It would then apply the premium/discount based on player ranks. We have plans to adjust by scoring settings in the future.
Hope that clarifies!
Great site and fantastic information. Can you tell me how often the data is updated? I ask because it still lists Kelvin Benjamin as a to 60 pick and he’s on IR.
We update the data regularly. The issue with K. Benjamin is that not all sources are updated. CBS and FOX still have him projected at 1000+ yards so he will remain up there in the rankings until those are updated.
Hope that helps,
Can R be used on Mac or PC only?
1) Is there a step by step tutorial on how set it up?
2) I noticed that the lineup optimizer was not working properly yesterday. When selecting “Week 2” and “Draft Kings” I kept getting an error message and then noticing that it kept having the $60,000 balance. Hopefully, you can duplicate this. If not, I’ll just assume it was from heavy usage. Thank you so much as I’m a huge believe of analytics!
1) You can download R on Windows, Mac, and Linux: https://cran.r-project.org/
2) Thanks, we’re looking into this.
Good morning, Isaac: I’d like to know when to expect the next update of your player projections. Thank you in advance for your reply, as well as for your labor of love.
Is there a way to donate to you via paypal? Your software is helping me a lot and I want to reciprocate in a modest way. (also, thanks for updating the stats today. Can you make it so that week 3 is available instead of just week 1 and 2, thank you!)
See the “Donate” tab at the top. Just added Week 3 data. Thanks very much for your support!
It sound like people are talking about a lineup optimizing tool for weekly fantasy games (Draft Kings, Fan Duel, etc.), but I can’t seem to find one in the tools section. Can someone point me in the right direction? Thanks
See the lineup optimizer:
I actually implemented a very similar approach to optimize my lineups on a weekly basis. So for each week, iterate 10,000 times and (1) generate random scores for all players using a gamma distribution based on player historic performance and then (2) optimize the lineup for each iteration. My goal was to afterwards analyze what lineups (not players) came up more often. The challenge I am facing is as I get over 20,000 iterations it takes about an hour to run my routine and even then there are only about 30 lineups (out of 20,000) that are repeated. I am assuming this is due to the many player options and high player variability. I considered first reducing the number of players in the raw list but by doing that I feel am adding bias to the process (where do I draw the line?). Any thoughts on additional steps I could take.
Do you think that with this strategy I should iterate at least 100,000 times as your script indicates?
Any tips on making the script more efficient/faster? By the way, I am using Python for my script.
I know these are kind of open ended questions…but any thoughts would be appreciated.
The greater variance around your estimates, the more iterations will be necessary to generate a stable solution. How many iterations are necessary is somewhat of an empirical question that you can examine by testing (and re-testing) different numbers of iterations to see when the solution begins to stabilize. To speed up the analysis, you might look into methods for parallel processing.
Hope that helps,
Still learning more stats… it seems like you’re assuming that player points follows a normal distribution defined by the mean and s.d. of their projections. Do you think that assumption is valid or would it be worth it to try something else? I’m mostly curious if there are directions to improve this work further.
Yes, this particular post assumes a normal distribution for the sake of simulating (i.e., drawing points from a distribution–for which we’d have to assume some form of distribution). Without more sources of projections (i.e., points in the distribution), it’d be hard to assume a more complex distribution. Note that our webapps don’t assume a particular distribution for the ceiling and floor—they’re calculated based on percentiles. Hope that clarifies!
If you’re playing 50/50s, would you prefer playing with the lineup with the highest floor or the one with the highest points? Ceiling sounds like the one you’d play for a winner-take-all.
i am wondering if someone can help me understand this line:
sum(optimizeData[optimizeData$player %in% optimizeTeam(points=optimizeData$solutionSum, maxRisk=100)$players, “projections”])
ive managed to step through this and understand what it is doing, but i am stuck on “projections”. is this a placeholder for me to update to another value? or is this referring to the dataframe loaded early (BidUpTo.RData). When i get to this line it fails, and i am not sure why. If i remove sum() it returns “projections”. if i remove the quotes, it returns the datatable. When i add the sum back it still fails.
is this because i need to load another package? other then Rglpk and data.table?
Just looking for some guidance on what i am doing wrong. Thanks in advance.
“projections” is a variable (column) in the “optimizeData” object. Hope that helps!
Thanks for the reply. I am not seeing where the column projections comes from… when i run the script and check the optimizedata object i am not seeing projections. What am i missing?
We’re not maintaining those old scripts anymore, I’m not really sure. The projections must have come from another object that was loaded at the top of the file (that was generated from another script on which it depends, e.g.: https://github.com/isaactpetersen/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Calculations/Calculate%20League%20Projections.R). If you can scrape and calculate projections, you should be able to substitute that vector in for the missing column (as long as the projections are merged with the right players).
Hope that helps,
First of all, I’m a big fan of your statistical and open-source approach to fantasy football.
A few questions:
1) I just forked your github repository. Do you make the latest projections and costs available there? Or should we use one of your scrapers (I think the getProjections.R script contains some of that functionality)?
2) Also, have you included player-to-player cross correlations in your simulations anywhere? It might make the simulations a bit more accurate generally. But for DFS tournaments, it would might provide a way to pick rosters with the potential to be extreme outliers.
3) Lastly, I’m still running the simulation (as you noted, it takes a bit of time), but is that the only way to create a ‘portfolio’ of the top N rosters? I’d like to systematically reduce some of the idiosyncrasies inherent in relying on only a few rosters.
Thanks in advance for any additional insight you can provide.
1) Those scripts are out-of-date and not maintained. You might be able to use them, but we recommend using the ffanalytics R package instead: http://fantasyfootballanalytics.net/2016/06/ffanalytics-r-package-fantasy-football-data-analysis.html.
2) See these topics discussed here: http://fantasyfootballanalytics.net/2015/03/fantasy-football-is-like-stock-picking.html. There is a link in the article to correlation matrices.
3) You can reduce the number of simulations if you want to reduce time.
How do you place players into the “Other players drafted” section of the optimizer.
My snake draft is today, and as I logged into your projections tool I noticed that there were only week 1 projections available. Is it still possible for me to see the projections for the entire year?
Thanks so much for your help,