Weekly Variability Simulation of Fantasy Football Projections

by Isaac Petersen
in R · Risk
— 20 Jul, 2014

In this post, I show how to estimate players’ week-to-week variability in fantasy football points. In a prior post, I demonstrated how to calculate a player’s risk level, as defined by the variability of their projected points across sources. As a reader pointed out, another form of meaningful variability is week-to-week variability (in addition to variability across sources). Some fantasy statistics are more variable (TDs) than others (yards) from week to week. For players with a higher percentage of their projected points scored by TDs, their weekly points are likely to be more variable. Another example might be possession receivers vs deep threats. Possession receivers are likely more reliable from week-to-week than deep threats who are more boom-or-bust. Here, I use a simulation to estimate each player’s week-to-week variability in fantasy points.

The R Scripts

The R Script for the “Historical Weekly Variability” section is below:

https://github.com/FantasyFootballAnalytics/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Historical/Historical%20Weekly.R

The R Script for the “Weekly Variability Simulation” section is below:

https://github.com/FantasyFootballAnalytics/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Posts/Weekly%20Simulation.R

Historical Weekly Variability

In order to simulate players’ weekly fantasy points, we first must determine the distribution from which to sample for each player and statistical category (passing yards, rushing TDs, etc.). I chose to sample from a normal distribution, with each player’s weekly mean of the statistical category as the mean of his distribution. In other words, if Peyton Manning is projected to have 4800 passing yards this season, that equals an average of 300 yards per game (4800/16). Thus, for sampling Peyton Manning’s weekly passing yards, we can sample from a distribution with a mean of 300.

For a normal distribution, we have to specify a mean and standard deviation. What standard deviation should we use for Peyton Manning’s weekly passing yards? We could theoretically use Peyton Manning’s weekly variability from last season, but some players do not have statistics from last year. As a result, I chose to calculate the historical weekly variability for passing yards (and all other statistical categories) averaged across all players from the past three seasons. Then, we use the historical week-to-week standard deviation of players’ passing yards as the standard deviation of players’ sampling distributions for passing yards.

To do this, I scrape data from every week of the season (weeks 1-17) from Pro-Football-Reference for the past three seasons:

#Libraries library("XML") #Specify info to scrape years <- 2011:2013 weeks <- 17 #Scrape data qb <- list() rb1 <- list() rb2 <- list() rb3 <- list() wr1 <- list() wr2 <- list() wr3 <- list() wr4 <- list() wr5 <- list() wr6 <- list() pb <- txtProgressBar(min = 1, max = weeks, style = 3) for(i in 1:weeks){ setTxtProgressBar(pb, i) qb[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=pass_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=pass_att", sep=""), stringsAsFactors = FALSE)$stats rb1[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds", sep=""), stringsAsFactors = FALSE)$stats rb2[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds&order_by_asc=&offset=100", sep=""), stringsAsFactors = FALSE)$stats rb3[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds&order_by_asc=&offset=200", sep=""), stringsAsFactors = FALSE)$stats wr1[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds", sep=""), stringsAsFactors = FALSE)$stats wr2[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=100", sep=""), stringsAsFactors = FALSE)$stats wr3[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=200", sep=""), stringsAsFactors = FALSE)$stats wr4[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=300", sep=""), stringsAsFactors = FALSE)$stats wr5[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=400", sep=""), stringsAsFactors = FALSE)$stats wr6[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=500", sep=""), stringsAsFactors = FALSE)$stats }

After cleaning and merging the data, I then put the data in the necessary form for calculating players’ weekly variability. This involves transforming the data from long form to wide form so that each week has a separate column:

#Long to Wide weeklyDataWide <- reshape(weeklyDataLong, timevar = c("week"), idvar = c("name","year"), direction = "wide", sep="")

Then I calculate the week-to-week standard deviation of each statistical category for every player in each season. Finally, I calculate a robust average across all players and seasons to get a general week-to-week standard deviation for each statistical category:

#Calculate week-to-week SD for each statistical category across same player/year combinations sdPassYds <- apply(weeklyDataWide[,grep("passYds", names(weeklyDataWide))], 1, sd) sdPassTds <- apply(weeklyDataWide[,grep("passTds", names(weeklyDataWide))], 1, sd) sdPassInt <- apply(weeklyDataWide[,grep("passInt", names(weeklyDataWide))], 1, sd) sdRushYds <- apply(weeklyDataWide[,grep("rushYds", names(weeklyDataWide))], 1, sd) sdRushTds <- apply(weeklyDataWide[,grep("rushTds", names(weeklyDataWide))], 1, sd) sdRec <- apply(weeklyDataWide[,grep("rec", names(weeklyDataWide))], 1, sd) sdRecYds <- apply(weeklyDataWide[,grep("recYds", names(weeklyDataWide))], 1, sd) sdRecTds <- apply(weeklyDataWide[,grep("recTds", names(weeklyDataWide))], 1, sd) sdVars <- data.frame(sdPassYds, sdPassTds, sdPassInt, sdRushYds, sdRushTds, sdRec, sdRecYds, sdRecTds) #Robust average of week-to-week SD for each statistical category sdAverage <- data.frame(t(apply(sdVars, 2, function(x) wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate)))

Here are the average week-to-week standard deviations for each statistical category:

Passing yards : 82.1 yards
Passing TDs: 0.8 TDs
Passing INTs: 0.7 INTs
Rushing yards: 11.3 yards
Rushing TDs: 0.4 TDs
Receptions: 11.1 receptions
Receiving yards: 15.9 yards
Receiving TDs: 0.4 TDs

Here are the density plots of the week-to-week standard deviations of each statistical category for the different players and seasons:

Weekly Variability Simulation

Now that we’ve calculated the historical week-to-week standard deviation for each statistical category, we can simulate players’ weekly performances for each statistical category using the sampling distribution of a) that player’s weekly mean and b) the historical week-to-week standard deviation for the relevant statistical category. In other words, for simulating Peyton Manning’s passing yards in the example earlier, we will sample from the distribution with a mean of 300 passing yards and a standard deviation of 82.1 yards. But the sampling has some constraints. First, the samples (weekly performances in each game) must sum to equal Manning’s projected passing yards for the season. Second, for some statistical categories (e.g., TDs), values can only be positive integers (e.g., you can’t score half a touchdown or negative touchdowns in a game). Here’s the function for taking ‘n’ samples from a distribution whose mean is ‘sum’/’n’ and whose standard deviation is ‘sd’, and that only include positive integers that sum to equal ‘sum’:

simulateIntegers <- function(n, sum, sd, pos.only = TRUE){ if(sum == 0 & pos.only == TRUE){ vec <- rep(0, n) } else{ vec <- rnorm(n, sum/n, sd) if (abs(sum(vec)) < 0.01) vec <- vec + 1 vec <- round(vec / sum(vec) * sum) deviation <- sum - sum(vec) for (. in seq_len(abs(deviation))){ vec[i] <- vec[i <- sample(n, 1)] + sign(deviation) } if (pos.only) while (any(vec < 0)){ negs <- vec < 0 pos <- vec > 0 vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1 vec[pos][i] <- vec[pos ][i <- sample(sum(pos ), 1)] - 1 } } vec }

Using this function, and plugging in the player’s season projection and the historical week-to-week standard deviation for each statistical category, we simulate all 16 games for 100 different seasons for each player and statistical category:

#Simulation simulations <- 100 games <- 16 passYds <- list() passTds <- list() passInt <- list() rushYds <- list() rushTds <- list() rec <- list() recYds <- list() recTds <- list() twoPts <- list() fumbles <- list() for(i in 1:simulations){ passYds[[i]] <- t(sapply(projections$passYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdPassYds), error=function(e) rep(NA, games)))) passTds[[i]] <- t(sapply(projections$passTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdPassTds), error=function(e) rep(NA, games)))) passInt[[i]] <- t(sapply(projections$passIntMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdPassInt), error=function(e) rep(NA, games)))) rushYds[[i]] <- t(sapply(projections$rushYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRushYds), error=function(e) rep(NA, games)))) rushTds[[i]] <- t(sapply(projections$rushTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRushTds), error=function(e) rep(NA, games)))) rec[[i]] <- t(sapply(projections$recMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRec), error=function(e) rep(NA, games)))) recYds[[i]] <- t(sapply(projections$recYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRecYds), error=function(e) rep(NA, games)))) recTds[[i]] <- t(sapply(projections$recTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRecTds), error=function(e) rep(NA, games)))) twoPts[[i]] <- t(sapply(projections$twoPtsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sd(c(rep(0, games - 1), 1))), error=function(e) rep(NA, games)))) fumbles[[i]] <- t(sapply(projections$fumblesMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sd(c(rep(0, games - 1), 1))), error=function(e) rep(NA, games)))) }

Then we calculate the weekly fantasy points for every player in all 100 seasons based on their simulated performances in each statistical category:

#Calculate fantasy points per week passYdsPts <- list() passTdsPts <- list() passIntPts <- list() rushYdsPts <- list() rushTdsPts <- list() recPts <- list() recYdsPts <- list() recTdsPts <- list() twoPtsPts <- list() fumblesPts <- list() fantasyPts <- list() for(i in 1:simulations){ passYdsPts[[i]] <- passYds[[i]] * passYdsMultiplier passTdsPts[[i]] <- passTds[[i]] * passTdsMultiplier passIntPts[[i]] <- passInt[[i]] * passIntMultiplier rushYdsPts[[i]] <- rushYds[[i]] * rushYdsMultiplier rushTdsPts[[i]] <- rushTds[[i]] * rushTdsMultiplier recPts[[i]] <- rec[[i]] * recMultiplier recYdsPts[[i]] <- recYds[[i]] * recYdsMultiplier recTdsPts[[i]] <- recTds[[i]] * recTdsMultiplier twoPtsPts[[i]] <- twoPts[[i]] * twoPtsMultiplier fumblesPts[[i]] <- fumbles[[i]] * fumlMultiplier fantasyPts[[i]] <- passYdsPts[[i]] + passTdsPts[[i]] + passIntPts[[i]] + rushYdsPts[[i]] + rushTdsPts[[i]] + recPts[[i]] + recYdsPts[[i]] + recTdsPts[[i]] + twoPtsPts[[i]] + fumblesPts[[i]] }

Finally, we calculate the week-to-week standard deviation of fantasy points for each player in all 100 seasons, and the average week-to-week standard deviation across seasons:

#Calculate SD of fantasy points per week sdWeeklyPts <- matrix(nrow=NROW(fantasyPts[[1]]), ncol=simulations) for(i in 1:simulations){ sdWeeklyPts[,i] <- apply(fantasyPts[[i]], 1, function(x) sd(x, na.rm=TRUE)) } #Calculate robust average of weekly SD projections$weeklySD <- apply(sdWeeklyPts, 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE)))

Here are some players with high weekly variability in fantasy points according to our simulation:

Cam Newton
Robert Griffin III
Colin Kaepernick
Jamaal Charles
Matt Forte
Montee Ball
Cordarrelle Patterson
Percy Harvin
Alshon Jeffery
Tavon Austin

Conclusion

Simulating weekly variability can help you identify players who are more or less reliable from week to week. For example, possession receivers (who rely less on long gains and touchdowns) are more reliable than deep threats, who tend to boom or bust.

Tags: R risk/reward simulation

— Isaac Petersen

My name is Isaac and I'm an assistant professor with a Ph.D. in Clinical Psychology. Why am I writing about fantasy football and data analysis? Because fantasy football involves the intersection of two things I love: sports and statistics. With this site, I hope to demonstrate the relevance of statistics for choosing the best team in fantasy football.

4 Comments

Shrinidhi says:

August 1, 2014 at 10:18 am

New to the site, this is really awesome work! I’m skeptical though, about applying an average of the standard deviations of all players to the top (fantasy relevant) players. The variance in passing TDs, passing yards, etc. is almost certainly higher (or at least significantly different) for players like Peyton Manning than players like Chad Henne. Wouldn’t introducing some cutoffs (e.g. only include player-games where the player got 5+ rushing attempts or 10+ passing attempts) give us an average that’s more applicable to the fantasy players we’re interested in?

Also, if I’m not mistaken, the variance in rushing yards and TDs includes both QBs and RBs, correct? I think this also could be an interesting distinction to draw, as the current method may be overestimating the variance in fantasy points of rushing quarterbacks, which may be why there are so many of them in your “high variability” group.

Reply
- Isaac Petersen says:
  
  August 1, 2014 at 4:24 pm
  
  Hey Shrindhi,
  
  Great points. I think there are definitely ways of improving this analysis (e.g., taking into account players’ past variability and separating QB/RB rushing yards). This was just my first pass on the topic. We have another contributor to the site doing a more thorough analysis to address some of the very concerns you mention. Let me know if you want me to put you in touch with him to see if you can collaborate on it together. Thanks for reaching out and for the interesting thoughts!
  
  -Isaac
  
  Reply
isaac says:

August 15, 2014 at 10:51 pm

hey this is really cool.
i’m new to R so may have missed some points. So you ran a simulation using players averages in each category and the average SD of each of these categories. If my reasoning is correct, this implies that players with a high average in a category that has high variance (e.g. pass TDs, pass YDs) will have the most variance.

Also, how do you calculate mean for rookies?

I agree with Shrinidhi’s advice of separating the calculations by players (or further, ‘running’ qb’s or not). I wonder what’s the best way to average across, say if a player’s experienced we use his personal SD vs a rookie we would use previous rookie SDs.

Reply
- Isaac Petersen says:
  
  August 16, 2014 at 11:04 pm
  
  Hey man, nice name. Good points, it’s likely that players with higher averages will have higher week-to-week SDs. It’s also likely that players who rely on some stats categories more than others (e.g., TDs) will also have higher SDs. This was just a first pass at the analysis, so I calculated an average week-to-week SD for all of the different stats categories and applied this to all players. We have another writer who is creating a more refined model to address many of these good suggestions (separating by position, considering a player’s historical SD, etc.). He’s hoping to post in within the next week or so.
  
  Thanks for the good ideas!
  -Isaac
  
  Reply

Weekly Variability Simulation of Fantasy Football Projections

The R Scripts

Historical Weekly Variability

Weekly Variability Simulation

Conclusion

Like this:

Related

4 Comments

Leave a Reply Cancel reply

Tabs

FFA Insider

Categories

Facebook

Twitter

Our Partners

The R Scripts

Historical Weekly Variability

Weekly Variability Simulation

Conclusion

Share this:

Like this:

Related

4 Comments

Leave a Reply Cancel reply

Tabs

FFA Insider

Categories

Facebook

Twitter

Our Partners