Calculating Custom Fantasy Football Projections for Your League using R

by Isaac Petersen
in Projections · R
— 10 Mar, 2013

In prior posts, I have shown how to download fantasy football projections from ESPN, CBS, and NFL.com. In this post, I will demonstrate how to take the projected points from these sources and calculate the projected points for your custom league given your league settings. Calculating players’ projected points in your league will be important for picking the ideal team for your league.

The R Script

The R script for calculating custom fantasy football projections for your league is located at:
https://github.com/FantasyFootballAnalytics/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Calculations/Calculate%20League%20Projections.R

League settings

In the first portion of the league settings script, we define (and can modify) your league settings. Here are the settings for my fantasy league:

passYdsMultiplier <- (1/25) #1 pt per 25 pass yds passTdsMultiplier <- 4 #4 pts per pass td passIntMultiplier <- -3 #-3 pts per INT rushYdsMultiplier <- (1/10) #1 pt per 10 rush yds rushTdsMultiplier <- 6 #6 pts per rush td recMultiplier <- 0 #0 pts per rec recYdsMultiplier <- (1/8) #1 pt per 8 rec yds recTdsMultiplier <- 6 #6 pts per rec td twoPtsMultiplier <- 2 #2 pts per 2-point conversion fumlMultiplier <- -3 #-3 pts per fumble lost

Calculating projected points for each source

We then take the projected stats for each of the categories above (e.g., passing yards, rushing yards) from each source and multiply them by the multiplier defined for your league above. Here are the calculations:

projections$passYdsPts <- projections$passYds * passYdsMultiplier projections$passTdsPts <- projections$passTds * passTdsMultiplier projections$passIntPts <- projections$passInt * passIntMultiplier projections$rushYdsPts <- projections$rushYds * rushYdsMultiplier projections$rushTdsPts <- projections$rushTds * rushTdsMultiplier projections$recPts <- projections$rec * recMultiplier projections$recYdsPts <- projections$recYds * recYdsMultiplier projections$recTdsPts <- projections$recTds * recTdsMultiplier projections$twoPtsPts <- projections$twoPts * twoPtsMultiplier projections$fumblesPts <- projections$fumbles * fumlMultiplier

The projected points for a given source is the linear (additive) combination of these point categories. For example, we add the projected points from each of the above categories:

projections$projectedPts <- rowSums(projections[,c("passYdsPts","passTdsPts","passIntPts","rushYdsPts","rushTdsPts","recPts","recYdsPts","recTdsPts","twoPtsPts","fumblesPts")], na.rm=T)

We complete each of these steps with each source (ESPN, CBS, NFL.com, etc.). Once we have the projected points for each source, we have a couple options: 1) We can compute a simple average across the sites’ projections. 2) We can compute a robust average that is not as affected by outliers. 3) We can compute a weighted average where we weight the sources we trust more heavily in our average. 4) We can compute a latent variable that represents the common variance among the sources. I tend not to trust individual sources of projections because they tend not to reliably outperform the average, so I will compute an average, robust average, and latent variable (but see here if you want an example of a weighted average of fantasy projections).

Average across sources

To calculate an average of projections across sources, we first calculate an average of projected statistics for each of the categories across sources:

projections$passYds <- rowMeans(projections[,paste("passYds", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE) projections$passTds <- rowMeans(projections[,paste("passTds", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE) projections$passInt <- rowMeans(projections[,paste("passInt", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE) projections$rushYds <- rowMeans(projections[,paste("rushYds", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE) projections$rushTds <- rowMeans(projections[,paste("rushTds", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE) projections$rec <- rowMeans(projections[,paste("rec", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE) projections$recYds <- rowMeans(projections[,paste("recYds", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE) projections$recTds <- rowMeans(projections[,paste("recTds", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE) projections$twoPts <- rowMeans(projections[,paste("twoPts", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE) projections$fumbles <- rowMeans(projections[,paste("fumbles", sourcesOfProjectionsAbbreviation, sep="_")], na.rm=TRUE)

Then we multiply the projected stats categories by the multiplier defined by our league settings:

The average projected points across sources is the sum of the points across categories (that have been averaged across sources):

rowSums(projections[,c("passYdsPts","passTdsPts","passIntPts","rushYdsPts","rushTdsPts","recPts","recYdsPts","recTdsPts","twoPtsPts","fumblesPts")], na.rm=T)

Robust average (Hodges-Lehmann)

Means of projections are not ideal because they can be affected by extreme values called outliers. In other words, if most sources have Tom Brady scoring 300 points, and a crazy source has Tom Brady scoring 50 points, the average would be greatly affected by the source predicting 50 points. In order to calculate a more meaningful average, we calculate a robust average that is not as affected by outliers. We use the Hodges-Lehmann estimator, which is the median of all pairwise means, and is a continuous version of the median. Here’s how we calculate the robust average of all statistical categories across sources:

projections$passYdsMedian <- apply(projections[,paste("passYds", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE))) projections$passTdsMedian <- apply(projections[,paste("passTds", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE))) projections$passIntMedian <- apply(projections[,paste("passInt", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE))) projections$rushYdsMedian <- apply(projections[,paste("rushYds", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE))) projections$rushTdsMedian <- apply(projections[,paste("rushTds", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE))) projections$recMedian <- apply(projections[,paste("rec", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE))) projections$recYdsMedian <- apply(projections[,paste("recYds", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE))) projections$recTdsMedian <- apply(projections[,paste("recTds", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE))) projections$twoPtsMedian <- apply(projections[,paste("twoPts", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE))) projections$fumblesMedian <- apply(projections[,paste("fumbles", sourcesOfProjectionsAbbreviation, sep="_")], 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE)))

Then we can calculate projected points like we did above: first multiplying each category by its multiplier and then taking the sum of points across categories.

Latent variables

Latent variables are helpful for calculating an unobserved variable based on the common variance among various indicator variables. Latent variables tend to have stronger psychometric properties than simple average variables because they retain the common variance (thought to be “true” variance) and discard the unique variance (i.e., measurement error). Examination of the correlations among the various sources suggests that they are highly correlated (rs > .89), suggesting that they are measuring the same thing, and that they can be combined in a latent variable.

> round(cor(projections[,c("espn","cbs","nfl","average")], use="pairwise.complete.obs"),2)
         espn    cbs   nfl   average
espn     1.00   0.94   0.90     0.98
cbs      0.94   1.00   0.89     0.97
nfl      0.90   0.89   1.00     0.93
average  0.98   0.97   0.93     1.00

To compute the latent variable representing the common variance among the projections from ESPN, CBS, and NFL.com, we use the factanal function to compute a factor analysis. We want to keep 1 factor, and the factor scores represent each player’s standardized value on the latent factor of projected points:

factor.analysis <- factanal(~ espn + cbs + nfl, factors = 1, scores = "Bartlett", data=projections)
factor.analysis$scores

We want to know each player’s value on the latent metric, but the factor scores are standardized with a mean of 0 and a standard deviation of 1. As a result, we have to rescale the values so that they are meaningful representation of the players’ projected points. To do that, we rescale the latent factor scores to the same distribution (Weibull) as the average projections. Then we rescale the distribution to have the same range as the average projections:

#Calculate shape and rate parameters of average projections for Weibull distribution weibullShape <- fitdistr(projections$projectedPts, 'weibull')$estimate[[1]] weibullScale <- fitdistr(projections$projectedPts, 'weibull')$estimate[[2]] projectedPtsLatentWeibull <- qweibull(pnorm(projectedPtsLatent), shape=weibullShape, scale=weibullScale) #Recale distribution to have same range as average projections rescaleRange function(variable, minOutput, maxOutput){ minObserved min(variable) maxObserved max(variable) values (maxOutput-minOutput)/(maxObserved-minObserved)*(variable-maxObserved)+maxOutput return(values) } projections$projectedPtsLatent <- rescaleRange(variable=projectedPtsLatentWeibull, minOutput=0, maxOutput=max(projections$projectedPts))

That’s it! We now have projections for our league from ESPN, CBS, and NFL.com, in addition to the average and robust average among their projections, and the latent combination of the three. Here’s a density plot showing the similarities among the distributions of projected points from the three different sources:

ggplot(densityData, aes(x=pointDensity, fill=sourceDensity)) + geom_density(alpha=.3) + xlab("Player's Projected Points") + ggtitle("Density Plot of Projected Points") + theme(legend.title=element_blank())

Tags: R

— Isaac Petersen

My name is Isaac and I'm an assistant professor with a Ph.D. in Clinical Psychology. Why am I writing about fantasy football and data analysis? Because fantasy football involves the intersection of two things I love: sports and statistics. With this site, I hope to demonstrate the relevance of statistics for choosing the best team in fantasy football.

11 Comments

Anonymous says:

July 18, 2013 at 2:28 am

Issac, I love the work you’ve posted on this blog. I used to run a lot of this data by hand in excel before I found this site. I taught myself R just so I could replicate and understand your work. I was wondering if you had any thoughts on how to recognize value in players as the season progresses. I’m thinking about the free agent market as well as trading with other people in the league. (Trading my over valued player for his under valued workhorse, etc.)

Reply
Isaac Petersen says:

July 18, 2013 at 10:53 pm

Thanks for your interest! Excel is common, but there are many good reasons to prefer R to Excel, see e.g.:

http://www.michaelmilton.net/2010/01/26/when-to-use-excel-when-to-use-r/
http://blog.revolutionanalytics.com/2013/04/more-reasons-not-to-use-excel-for-modeling.html
http://www.burns-stat.com/living-it-up-with-computational-errors/?utm_source=rss&utm_medium=rss&utm_campaign=living-it-up-with-computational-errors

It’s an interesting problem trying to determine players’ value during the season. I would define value in this context as a player’s projected points for the remaining games of the season minus the projected points for a typical replacement player (http://www.footballguys.com/05vbdrevisited.htm). I think FantasyPros has weekly projections that you could use to calculate projections for the remaining games of the season. If you want to calculate remaining season projections on your own, these minitab articles might be helpful:
http://blog.minitab.com/blog/the-statistics-game/projecting-the-rest-of-the-2012-fantasy-football-season
http://www.minitab.com/en-US/training/articles/Minitab-fantasy-football-week-6.aspx
http://www.minitab.com/en-US/training/articles/Minitab-fantasy-football-data.aspx

I’d be happy to discuss more if you have questions/ideas. Good luck!

Reply
joikd says:

June 3, 2014 at 8:51 am

Any chance of getting the number of receptions added for those in PPR leagues? I did not see it in the league settings script. If not, my league doesn’t use fumbles at all except for defense–maybe you could just let me know which initial value to change from fumble to receptions. Then, “fumbles” will mean “receptions” on the output.
Thanks so much for your work!

Reply
- Isaac Petersen says:
  
  June 3, 2014 at 10:10 pm
  
  Just updated the scripts to include receptions for PPR leagues. The snake and auction draft apps already have receptions included. Hope that helps!
  
  Reply
  - joikd says:
    
    June 16, 2014 at 6:12 pm
    
    Thanks for the addition and for all of your great work here!
    
    Reply
ATANU says:

June 5, 2014 at 6:45 am

I am a research sholar in statistics. Can you please elaborate the statistical methods used and their justifications ? I cannot find the post where you have described how to downaload the data.

Reply
- Isaac Petersen says:
  
  June 5, 2014 at 8:21 am
  
  What statistics do you want me to elaborate on? Here’s info on how to download projections from ESPN, CBS, NFL.com, and FantasyPros:
  https://fantasyfootballanalytics.net/2013/03/download-fantasy-football-projections.html
  https://fantasyfootballanalytics.net/2013/03/downloading-cbs-fantasy-football.html
  https://fantasyfootballanalytics.net/2013/03/downloading-nflcom-fantasy-football.html
  https://fantasyfootballanalytics.net/2013/05/downloading-fantasypros-fantasy.html
  
  Here are the R scripts for downloading projections:
  https://github.com/isaactpetersen/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Accuscore%20Projections.R
  https://github.com/isaactpetersen/FantasyFootballAnalyticsR/blob/master/R%20Scripts/CBS%20Projections.R
  https://github.com/isaactpetersen/FantasyFootballAnalyticsR/blob/master/R%20Scripts/ESPN%20Projections.R
  https://github.com/isaactpetersen/FantasyFootballAnalyticsR/blob/master/R%20Scripts/FantasyPros%20Projections.R
  https://github.com/isaactpetersen/FantasyFootballAnalyticsR/blob/master/R%20Scripts/FantasySharks%20Projections.R
  https://github.com/isaactpetersen/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Yahoo%20Projections.R
  
  Reply
erick says:

August 12, 2014 at 1:33 pm

How do you calculate new auction values based on number of starters? Is there a multiplier to each player based on how many other players start at that position?

Reply
- Isaac Petersen says:
  
  August 12, 2014 at 10:00 pm
  
  Hey Erick,
  
  I normalize the auction values relative to your league cap and the number of teams. I also assign a premium to the top players and a discount to the worst players. For more info, see here:
  https://fantasyfootballanalytics.net/2013/06/win-your-fantasy-football-auction-draft.html
  
  Hope that helps!
  -Isaac
  
  Reply
Tlexium says:

November 5, 2017 at 11:02 am

Hi Isaac,

Can you elaborate on how to utilize the wilcox test to calculate robust average/pseudo-medians in R? Let’s say I have a list of player names in column A and their mean projections (across multiple sources) in column B. How can I use data in column B to calculate robust average? Or can I only calculate the robust average if I have each individual source and not just their average?

Reply
- Isaac Petersen says:
  
  November 5, 2017 at 5:51 pm
  
  You’d need each individual source’s projection to be able to calculate a robust average for a given player. You can then input a given player’s vector of projections into the wilcox.test() function to calculate the robust average using the Hodges-Lehmann estimator: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
  
  Reply

Calculating Custom Fantasy Football Projections for Your League using R

The R Script

League settings

Calculating projected points for each source

Average across sources

Robust average (Hodges-Lehmann)

Latent variables

Like this:

Related

11 Comments

Leave a Reply Cancel reply

Tabs

FFA Insider

Categories

Facebook

Twitter

Our Partners

The R Script

League settings

Calculating projected points for each source

Average across sources

Robust average (Hodges-Lehmann)

Latent variables

Share this:

Like this:

Related

11 Comments

Leave a Reply Cancel reply

Tabs

FFA Insider

Categories

Facebook

Twitter

Our Partners