Gold Mining Explained
37We wanted to provide a more useful way of understanding how players in the NFL are projected to perform week in and week out in Fantasy Football. Wisdom of the crowd is an underlying principle leveraged by many within the statistics community to bridge the gap between both intuitive pundits and statistical forecasts. There is value in examining both perspectives. A few things can be discovered such as biases, uncertainty, and ultimately opportunity when examining expert projections.
R Script
The R Scripts (an R Markdown files) for WR gold mining is located here:
Gold Mining: What are we doing?
At it’s core, we’re aggregating “experts” actual projections of players’ weekly performance. “Experts” range from well known pundits to companies’ statistical forecasting algorithms. In the end, the difference between the two is negligible as the information we seek comes from aggregating these sources.
What’s different about this?
Many sites and people that aggregate fantasy football statistics often aggregate rankings. While aggregating rankings can be helpful, the reader loses granularity of the difference between rankings.
Simple Example:
- Calvin Johnson is unanimously ranked #1
- AJ Green is unanimously ranked #2
- Jordy Nelson is unanimously ranked #3
If we simply left our analysis at this point, we would clearly understand the pecking order established: Johnson > Green > Nelson. However, imagine the alternative:
- Calvin Johnson is projected 25 points
- AJ Green is projected 16 points
- Jordy Nelson is projected 15 points
Knowing that Nelson is the #3 best WR is hardly relevant when you consider he’s only 1 point behind Green. As a manager, understanding these differences is vital to get a sense of upside and ultimately value to your team.
One step deeper
We’re not satisfied with just knowing the aggregated projection of a single point estimate. The human mind tends to prefer to think about the world as specific numbers, instead of confidence intervals. It’s also important to think of these projections in terms of the variation (uncertainty) each aggregation possesses.
Consider our previous example now with confidence intervals.
Simple Example 2.0:
- Calvin Johnson is projected between 20-30 points
- AJ Green is projected between 14-18 points
- Jordy Nelson is projected between 5-30 points
Had we left our analysis at aggregation of projections, we may not have discovered Jordy Nelson’s extremely high ceiling. For a manager projected to lose but in desperate need of a win, starting Nelson over Green would be recommended because Nelson has a higher ceiling than Green. Something about his team, matchup, etc. is giving him a high variation which could result in big points in a big game. It is also possible, however, that Nelson could bust because he also has a low floor. Thus, you should pick the right players to start based on your team’s needs.
How can you pick the right players to start? See the graph below:
What’s our graph showing?
We’ve discussed the value of understanding point projections over rank projections, but getting a sense for ranks can also give us insight into players that may be statistically incorrectly ranked. You’ll notice that some players have higher point projections than their peers within their tier. We show different tiers in the graph using different colors. Players with the same color are considered to be in the same tier. Tiers were calculated statistically with a cluster analysis in the mclust package in R. The number inside the tier is the player’s robust average across sources of projected points (using the Hodges-Lehmann estimator also known as the pseudo-median—calculated from the wilcox.test function in R). This value can be considered the “most likely” number of points the player is projected to score. The line reflects the 10th (floor) and 90th (ceiling) percentiles of a player’s projected points across analysts.
What can I do with all of this information?
At the end of the day, the best way to use this information depends on your situation (e.g., picking players with high ceilings when you are projected to lose and picking players with high floors when projected to win). Typically it works well to identify waiver wire pickups that have a high variance (tons of upside) or perhaps you can identify players that are sure bets week in and week out. The key is to look for players that tend to “break the mold” of the data. These players tend to have the most interesting stories.
Aggressive managers tend to try to find players that have high variation with low averages (which means other managers/experts might undervalue them).
Conservative managers tend to find solid week in and week out contributors.
I don’t get it
Ask your questions below in the comment section, we’re happy to help explain more specific questions as they arise! Good luck everyone!
do you have these for the rest of the positions ? I need help with a dst this week. Houston or San Diego?
Not yet, but we are working on it.
Great post. I totally understand the point about point projections being more valuable than rank projections, but what if I just need to have a rank order for the week and didn’t care about the point differential between players? What methodology would you suggest using to aggregate rankings across multiple experts to determine one common ranking? Thanks.
Hey Evan,
I would just calculate an average of the projections for each player across sources. Ignoring risk level, weekly rankings would simply be based on the number of projected points for each player (#1 = player projected to score the most points, #2 = player projected to score the 2nd most points, etc.). You could adjust this if you value higher (or lower) risk players. For calculating rankings from season-long projections (as opposed to weekly projections), you would want to use the Value-Over-Replacement (VOR) for comparing across positions. For more info on the benefits of using projections instead of rankings, see here:
https://fantasyfootballanalytics.net/2014/08/use-projections-not-rankings.html
Hope that helps!
-Isaac
Awesome Job,
I know this question was asked but Here we are in July of 2015 and I was looking for an updated answer. Will you be gold mining other positions? If so which ones this year?
Hi Reid,
We hope to eventually provide gold mining for all positions, but our current focus is on developing draft tools such as the Auction tool so they are ready for draft season. We will focus more on gold-mining when the season is in progress.
Thanks,
Isaac
Hi Reid,
I saw you have done analysis on the accuracy of projections across a season, but have you done any analysis on how accurate weekly projections are?
Hi John,
We will be tracking the accuracy of weekly projections this season. Stay tuned!
-Isaac
Hello,
I am very interested in seeing these gold minings for the upcoming season. Do you anticipate us seeing anything in week 1?
Josh
So I was reading over the “gold mining”…. Actually I have enjoyed reading all of your work and give thanks for the insight you have provided. I am a rookie using R but learning every day. Getting back to gold mining, you mention how this is calculated and list the cluster pkg, an estimator and test function. If I wanted to create these same tiers can I capture script already written? I am a bit in the dark how I can make this happen. Also, are the graphics provided or is there script for that too? Wondering if you might point me in the right direction. Again, thank you for all you do!
Hi Kiko,
We link to the R script in the article above.
Hope that helps,
Isaac
Hi, I have been fooling with your scripts for most of the week. I was finally able to get some charts when I realized I’m pulling data from Week 11 of last season. Any idea how i can get more recent weekly projections?
Hi Don,
You can access our weekly projections here:
http://apps.fantasyfootballanalytics.net/projections
-Isaac
I am new to R as well and like Don I used your sample data to replicate your results from week 11 from last year. When you say use the link to the current weekly projections, do you mean I need to select the download button, choose raw data and save? Would this replace the ffa data using the script example? Sorry new to this 🙂
You can download the projections from the Projections tool (yes, click Download). Not sure they’d work with our scripts because we’re working on updating them for the repo.
I have a few suggestions that may or may not be beneficial?
One way to determine whether players are on track is to use the weekly projected points as a way to track the average performance.
For instance, Jamaal Charles is projected to have 234.7 points this season. He made 15 points the 1st week. If points are calculated with the regular season in mind (16 or 17 if you count the bye) that would be equivalent to 240 points. Then basically take an EMA of both values (238 points) and use that to see if players are on track or not.
The other thing is determining which QB’s to play. Basically take the stats of the QB and Defense and determine which QB would be the best fit to play. It’s more involved than basically, but it’d be interesting to see if matching defense to QB’s could account for some predictably in determining whether a QB will make his weekly projected points.
Isaac,
The Lineup Optimizer is giving me three TE for draftkings setup for the lowest floor lineup.
Also, only 2 WR. It looks like it may be a bug. The setup is correct.
After playing with it, It is just completely ignoring my setup in the change settings modal. Regardless of what positions I select, it does what it pleases.
Users can’t modify the positions for weekly leagues because, to my knowledge, the weekly leagues for the same site use the same position settings.
This should be fixed now.
Hi,
I have looked through various pages of you site and picked up what I could but was wondering about researching sports that dont have the detailed stats available.
As a hypothetical example pretend there is something called ‘Super Volleyball’ and in it has a virtual league going…but there are nt player rankings such as those found for NFL and NBA etc, rather there are just the general stats relevant to the game.
At this point what woudl you suggest in terms of player selection methods ? Start to tray and create regression models and see which variables are important ?
The stats and rankings etc avialable for NBA, NFL and the liek seem to be so far ahead of those for other sports.
thanks
I love this site. I am an Excel nerd who is just starting to learn R.
Just found the site this morning, so if I am asking questions that have been answered ad nauseam, please forgive me.
I have two questions/comments:
1.) Is there a way to get the raw data on the distribution of points? It seems like this would be an application that would benefit from running monte carlo simulations to compare and optimize your lineup vs your opponents lineup on a weekly basis.
2.) It seems like the data is strictly based on projections from experts. Is there any correlation done to look at players on the same team? Seems like there should be some positive correlation in historical points between QB/WR and negative correlation between WR/RB (i.e. not enough yards and touchdowns to go around or focusing on running game vs. passing game etc.). Do you start QB and WR on same team to get big upside on good game, do you avoid starting WR and RB from the same team (e.g Amari Cooper and Latavius Murray week 3?) to avoid cannibalization of points between players?
1) Yes, you can download the data here:
http://apps.fantasyfootballanalytics.net/projections
2) We discuss this here (and provide a link to correlations):
https://fantasyfootballanalytics.net/2015/03/fantasy-football-is-like-stock-picking.html
Do you have any insight into estimating the confidence interval for the actual outcome (as opposed to a confidence interval for the mean of the projections)? I would presume the former is wider than the latter, so the question would be: how much wider?
For the ceiling and floor of seasonal projections, we use the 10th and 90th percentiles of projections, which results in a much wider interval. I’d like to do this for gold mining, as well. Another approach would be, for each player, to consider their historical variability in forecasting their interval.
Awesome website! I’ll be donating to the site when I get paid tomorrow.
Hi Steve,
Thank you very much for your support!
-Isaac
Wow! I love your work! I am a bit of a modeler myself and looking to build some of my own tools. Can I ask though what algorithm you use (or the process by which) you come up with your lineup optimizers for points on the app page? Thanks!
We use the Rglpk package in R to find your optimal starting lineup by selecting the players that maximize the lineup’s sum of projected points within the cost constraints. For more info, see:
https://fantasyfootballanalytics.net/2013/06/win-your-fantasy-football-auction-draft.html
A second Question. Can you give me a bit more or point me out to where ti is explained that your raw data comes from? Is it just the averages projected from the multiple websites (with the weights applied) for each stat (ie. pass yds, pass tds, rush yds, rush tds, etc.) per player? The are the projections just the summation of the points per stat times the stat? Thanks!
The sources of our projections are listed in the apps (http://apps.fantasyfootballanalytics.net). See here for more info on how we aggregate and calculate custom projections for your league:
https://fantasyfootballanalytics.net/2014/06/custom-rankings-and-projections-for-your-league.html
Hello, I have a normal 14 team league and have already drafted using this websites projections with custom scoring settings of our league. My question is, do you have a tool I can use (lineup optimizer) that selects the best combination of players I drafted that I should be starting week 1 moving forward? I looked at the lineup optimizer and that seems like its for website like fanduel and auction leagues.
We recommend using weekly projections to make decisions on who to start in season
Where do I begin? I am having a difficult time understanding the basic directions. In this short amount of time, I am not trying to fully understand R. My main assignment was to create a confidence interval graph ranked on tiers for each main DFS position (QB, RB, WR, TE) based off scripts downloaded through here. I have found some issues with cloning via git, and I was wondering if the instructions were perhaps out of date, or if this is just something that has to take weeks to learn about, if one is not familiar with Git Bash, GitHub, Command Prompt, r, or statistics.
Please leave some feedback so I can get an idea of whether or not I should backburn this project. I love this idea either way. Thank you 🙂
It seems that most projections take a don’t take 10 expert inputs, and even for the ones that do, the 10th and 90th percentiles are essentially the min and max. Are these actual values or mathematically calculated somehow?
How is sleeper ranking on draft projections calculated?