Site icon R-bloggers

Weekly Variability Simulation of Fantasy Football Projections

[This article was first published on Fantasy Football Analytics » R | Fantasy Football Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, I show how to estimate players’ week-to-week variability in fantasy football points.  In a prior post, I demonstrated how to calculate a player’s risk level, as defined by the variability of their projected points across sources.  As a reader pointed out, another form of meaningful variability is week-to-week variability (in addition to variability across sources).  Some fantasy statistics are more variable (TDs) than others (yards) from week to week.  For players with a higher percentage of their projected points scored by TDs, their weekly points are likely to be more variable. Another example might be possession receivers vs deep threats.  Possession receivers are likely more reliable from week-to-week than deep threats who are more boom-or-bust.  Here, I use a simulation to estimate each player’s week-to-week variability in fantasy points.

The R Scripts

The R Script for the “Historical Weekly Variability” section is below:

https://github.com/dadrivr/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Historical/Historical%20Weekly.R

The R Script for the “Weekly Variability Simulation” section is below:

https://github.com/dadrivr/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Posts/Weekly%20Simulation.R

Historical Weekly Variability

In order to simulate players’ weekly fantasy points, we first must determine the distribution from which to sample for each player and statistical category (passing yards, rushing TDs, etc.).  I chose to sample from a normal distribution, with each player’s weekly mean of the statistical category as the mean of his distribution.  In other words, if Peyton Manning is projected to have 4800 passing yards this season, that equals an average of 300 yards per game (4800/16).  Thus, for sampling Peyton Manning’s weekly passing yards, we can sample from a distribution with a mean of 300.

For a normal distribution, we have to specify a mean and standard deviation.  What standard deviation should we use for Peyton Manning’s weekly passing yards?  We could theoretically use Peyton Manning’s weekly variability from last season, but some players do not have statistics from last year.  As a result, I chose to calculate the historical weekly variability for passing yards (and all other statistical categories) averaged across all players from the past three seasons.  Then, we use the historical week-to-week standard deviation of players’ passing yards as the standard deviation of players’ sampling distributions for passing yards.

To do this, I scrape data from every week of the season (weeks 1-17) from Pro-Football-Reference for the past three seasons:

#Libraries
library("XML")

#Specify info to scrape
years <- 2011:2013
weeks <- 17

#Scrape data
qb <- list()
rb <- list()
rb1 <- list()
rb2 <- list()
rb3 <- list()
wr <- list()
wr1 <- list()
wr2 <- list()
wr3 <- list()
wr4 <- list()
wr5 <- list()
wr6 <- list()

pb <- txtProgressBar(min = 1, max = weeks, style = 3)
for(i in 1:weeks){
  setTxtProgressBar(pb, i)
  qb[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=pass_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=pass_att", sep=""), stringsAsFactors = FALSE)$stats
  rb1[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds", sep=""), stringsAsFactors = FALSE)$stats
  rb2[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds&order_by_asc=&offset=100", sep=""), stringsAsFactors = FALSE)$stats
  rb3[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds&order_by_asc=&offset=200", sep=""), stringsAsFactors = FALSE)$stats
  wr1[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds", sep=""), stringsAsFactors = FALSE)$stats
  wr2[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=100", sep=""), stringsAsFactors = FALSE)$stats
  wr3[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=200", sep=""), stringsAsFactors = FALSE)$stats
  wr4[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=300", sep=""), stringsAsFactors = FALSE)$stats
  wr5[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=400", sep=""), stringsAsFactors = FALSE)$stats
  wr6[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=500", sep=""), stringsAsFactors = FALSE)$stats
}

After cleaning and merging the data, I then put the data in the necessary form for calculating players’ weekly variability.  This involves transforming the data from long form to wide form so that each week has a separate column:

#Long to Wide
weeklyDataWide <- reshape(weeklyDataLong,
                          timevar = c("week"),
                          idvar = c("name","year"),
                          direction = "wide",
                          sep="")

Then I calculate the week-to-week standard deviation of each statistical category for every player in each season.  Finally, I calculate a robust average across all players and seasons to get a general week-to-week standard deviation for each statistical category:

#Calculate week-to-week SD for each statistical category across same player/year combinations
sdPassYds <- apply(weeklyDataWide[,grep("passYds", names(weeklyDataWide))], 1, sd)
sdPassTds <- apply(weeklyDataWide[,grep("passTds", names(weeklyDataWide))], 1, sd)
sdPassInt <- apply(weeklyDataWide[,grep("passInt", names(weeklyDataWide))], 1, sd)
sdRushYds <- apply(weeklyDataWide[,grep("rushYds", names(weeklyDataWide))], 1, sd)
sdRushTds <- apply(weeklyDataWide[,grep("rushTds", names(weeklyDataWide))], 1, sd)
sdRec <- apply(weeklyDataWide[,grep("rec", names(weeklyDataWide))], 1, sd)
sdRecYds <- apply(weeklyDataWide[,grep("recYds", names(weeklyDataWide))], 1, sd)
sdRecTds <- apply(weeklyDataWide[,grep("recTds", names(weeklyDataWide))], 1, sd)

sdVars <- data.frame(sdPassYds, sdPassTds, sdPassInt, sdRushYds, sdRushTds, sdRec, sdRecYds, sdRecTds)

#Robust average of week-to-week SD for each statistical category
sdAverage <- data.frame(t(apply(sdVars, 2, function(x) wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate)))

Here are the average week-to-week standard deviations for each statistical category:

Here are the density plots of the week-to-week standard deviations of each statistical category for the different players and seasons:

 

Weekly Variability Simulation

Now that we’ve calculated the historical week-to-week standard deviation for each statistical category, we can simulate players’ weekly performances for each statistical category using the sampling distribution of a) that player’s weekly mean and b) the historical week-to-week standard deviation for the relevant statistical category.  In other words, for simulating Peyton Manning’s passing yards in the example earlier, we will sample from the distribution with a mean of 300 passing yards and a standard deviation of 82.1 yards.  But the sampling has some constraints.  First, the samples (weekly performances in each game) must sum to equal Manning’s projected passing yards for the season.  Second, for some statistical categories (e.g., TDs), values can only be positive integers (e.g., you can’t score half a touchdown or negative touchdowns in a game).  Here’s the function for taking ‘n’ samples from a distribution whose mean is ‘sum’/’n’ and whose standard deviation is ‘sd’, and that only include positive integers that sum to equal ‘sum’:

simulateIntegers <- function(n, sum, sd, pos.only = TRUE){
  if(sum == 0 & pos.only == TRUE){
    vec <- rep(0, n)
  } else{
    vec <- rnorm(n, sum/n, sd)
    if (abs(sum(vec)) < 0.01) vec <- vec + 1
    vec <- round(vec / sum(vec) * sum)
    deviation <- sum - sum(vec)
    for (. in seq_len(abs(deviation))){
      vec[i] <- vec[i <- sample(n, 1)] + sign(deviation)
    }
    if (pos.only) while (any(vec < 0)){
      negs <- vec < 0
      pos  <- vec > 0
      vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
      vec[pos][i]  <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
    }
  }
  vec
}

Using this function, and plugging in the player’s season projection and the historical week-to-week standard deviation for each statistical category, we simulate all 16 games for 100 different seasons for each player and statistical category:

#Simulation
simulations <- 100
games <- 16

passYds <- list()
passTds <- list()
passInt <- list()
rushYds <- list()
rushTds <- list()
rec <- list()
recYds <- list()
recTds <- list()
twoPts <- list()
fumbles <- list()

for(i in 1:simulations){
  passYds[[i]] <- t(sapply(projections$passYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdPassYds), error=function(e) rep(NA, games))))
  passTds[[i]] <- t(sapply(projections$passTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdPassTds), error=function(e) rep(NA, games))))
  passInt[[i]] <- t(sapply(projections$passIntMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdPassInt), error=function(e) rep(NA, games))))
  rushYds[[i]] <- t(sapply(projections$rushYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRushYds), error=function(e) rep(NA, games))))
  rushTds[[i]] <- t(sapply(projections$rushTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRushTds), error=function(e) rep(NA, games))))
  rec[[i]] <- t(sapply(projections$recMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRec), error=function(e) rep(NA, games))))
  recYds[[i]] <- t(sapply(projections$recYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRecYds), error=function(e) rep(NA, games))))
  recTds[[i]] <- t(sapply(projections$recTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRecTds), error=function(e) rep(NA, games))))
  twoPts[[i]] <- t(sapply(projections$twoPtsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sd(c(rep(0, games - 1), 1))), error=function(e) rep(NA, games))))
  fumbles[[i]] <- t(sapply(projections$fumblesMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sd(c(rep(0, games - 1), 1))), error=function(e) rep(NA, games))))
}

Then we calculate the weekly fantasy points for every player in all 100 seasons based on their simulated performances in each statistical category:

#Calculate fantasy points per week
passYdsPts <- list()
passTdsPts <- list()
passIntPts <- list()
rushYdsPts <- list()
rushTdsPts <- list()
recPts <- list()
recYdsPts <- list()
recTdsPts <- list()
twoPtsPts <- list()
fumblesPts <- list()
fantasyPts <- list()

for(i in 1:simulations){  
  passYdsPts[[i]] <- passYds[[i]] * passYdsMultiplier
  passTdsPts[[i]] <- passTds[[i]] * passTdsMultiplier
  passIntPts[[i]] <- passInt[[i]] * passIntMultiplier
  rushYdsPts[[i]] <- rushYds[[i]] * rushYdsMultiplier
  rushTdsPts[[i]] <- rushTds[[i]] * rushTdsMultiplier
  recPts[[i]] <- rec[[i]] * recMultiplier
  recYdsPts[[i]] <- recYds[[i]] * recYdsMultiplier
  recTdsPts[[i]] <- recTds[[i]] * recTdsMultiplier
  twoPtsPts[[i]] <- twoPts[[i]] * twoPtsMultiplier
  fumblesPts[[i]] <- fumbles[[i]] * fumlMultiplier
  
  fantasyPts[[i]] <- passYdsPts[[i]] + passTdsPts[[i]] + passIntPts[[i]] + rushYdsPts[[i]] + rushTdsPts[[i]] + recPts[[i]] + recYdsPts[[i]] + recTdsPts[[i]] + twoPtsPts[[i]] + fumblesPts[[i]]
}

Finally, we calculate the week-to-week standard deviation of fantasy points for each player in all 100 seasons, and the average week-to-week standard deviation across seasons:

#Calculate SD of fantasy points per week
sdWeeklyPts <- matrix(nrow=NROW(fantasyPts[[1]]), ncol=simulations)

for(i in 1:simulations){  
  sdWeeklyPts[,i] <- apply(fantasyPts[[i]], 1, function(x) sd(x, na.rm=TRUE))
}

#Calculate robust average of weekly SD
projections$weeklySD <- apply(sdWeeklyPts, 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE)))

Here are some players with high weekly variability in fantasy points according to our simulation:

Conclusion

Simulating weekly variability can help you identify players who are more or less reliable from week to week.  For example, possession receivers (who rely less on long gains and touchdowns) are more reliable than deep threats, who tend to boom or bust.

The post Weekly Variability Simulation of Fantasy Football Projections appeared first on Fantasy Football Analytics.

To leave a comment for the author, please follow the link and comment on their blog: Fantasy Football Analytics » R | Fantasy Football Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.