All the (NBA) box scores you ever wanted
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this previous post, I showed how one can scrape top-level NBA game data from BasketballReference.com. In the post after that, I demonstrated how to scrape play-by-play data for one game. After writing those posts, I thought to myself: why not do both? And that is what I did: scrape all the box scores for the 2017-18 NBA season and save them to an RDS object.
The code for web scraping can be found here. I don’t think it’s particularly insightful to go through the details line-by-line… What I’ll do instead is explain what is in the saved RDS object which will allow you to make use of the data, and I’ll end off with some visualizations to demonstrate what one can do with this data, specifically:
- Do Kevin Durant and Steph Curry get in each other’s way?
- Are the Golden State Warriors really better in the 3rd quarter?
(Code that I wrote for these questions can be found here.) Let’s load the packages and data (data available as links from this page):
library(tidyverse) game_df <- readRDS("NBA-2018_game_data.rds") master_list <- readRDS("NBA-2018_box_score.rds")
First, let’s take a quick look at game_df
which contains the top-level game data:
head(game_df) #> game_id date_game game_start_time visitor_team_name visitor_pts home_team_name home_pts #> 1 201710170CLE 2017-10-17 8:01p Boston Celtics 99 Cleveland Cavaliers 102 #> 2 201710170GSW 2017-10-17 10:30p Houston Rockets 122 Golden State Warriors 121 #> 3 201710180BOS 2017-10-18 7:30p Milwaukee Bucks 108 Boston Celtics 100 #> 4 201710180DAL 2017-10-18 8:30p Atlanta Hawks 117 Dallas Mavericks 111 #> 5 201710180DET 2017-10-18 7:00p Charlotte Hornets 90 Detroit Pistons 102 #> 6 201710180IND 2017-10-18 7:00p Brooklyn Nets 131 Indiana Pacers 140 #> overtimes attendance game_remarks game_type #> 1 20562 Regular #> 2 19596 Regular #> 3 18624 Regular #> 4 19709 Regular #> 5 20491 Regular #> 6 15008 Regular
The first column, game_id
, is BasketBallReference.com’s way of uniquely identifying a game. For example, if we wanted to see the results of the game between the Celtics and the Cavs on 17 Oct 2017, we would look up the game_id
(201710170CLE) and go to the URL https://www.basketball-reference.com/boxscores/201710170CLE.html. The game_id
is how I was able to scrape the box scores for all the games.
Next, let’s look at the master_list
object which contains all the box score information:
typeof(master_list) #> [1] "list" length(master_list) #> [1] 1312 head(names(master_list)) #> [1] "201710170CLE" "201710170GSW" "201710180BOS" "201710180DAL" "201710180DET" "201710180IND"
It is a list of length 1312, which corresponds to the total number of games (Regular Season and Playoffs) in the 2017-18 season. The keys of the list are the game_id
s, and (as we will see shortly) each element of the list is itself a list of length 5. Let’s take a closer look at the first element of the list:
str(master_list[[1]]) #> List of 5 #> $ visitor_basic_boxscore:'data.frame': 13 obs. of 22 variables: #> ..$ Player: chr [1:13] "Jaylen Brown" "Kyrie Irving" "Jayson Tatum" "Al Horford" ... #> ..$ MP : chr [1:13] "39:36" "39:21" "36:32" "32:07" ... #> ..$ FG : num [1:13] 11 8 5 2 1 5 2 2 0 0 ... #> ..$ FGA : num [1:13] 23 17 12 7 2 16 6 2 2 1 ... #> ..$ FG% : num [1:13] 0.478 0.471 0.417 0.286 0.5 0.313 0.333 1 0 0 ... #> ..$ 3P : num [1:13] 2 4 1 0 0 0 1 0 0 0 ... #> ..$ 3PA : num [1:13] 9 9 2 2 1 4 3 0 1 1 ... #> ..$ 3P% : num [1:13] 0.222 0.444 0.5 0 0 0 0.333 NA 0 0 ... #> ..$ FT : num [1:13] 1 2 3 5 0 2 4 2 0 0 ... #> ..$ FTA : num [1:13] 2 2 3 7 0 3 4 4 0 0 ... #> ..$ FT% : num [1:13] 0.5 1 1 0.714 NA 0.667 1 0.5 NA NA ... #> ..$ ORB : num [1:13] 1 2 4 0 0 0 0 2 0 0 ... #> ..$ DRB : num [1:13] 5 2 6 7 1 9 3 3 0 1 ... #> ..$ TRB : num [1:13] 6 4 10 7 1 9 3 5 0 1 ... #> ..$ AST : num [1:13] 0 10 3 5 0 3 2 1 0 0 ... #> ..$ STL : num [1:13] 2 3 0 0 0 2 4 0 0 0 ... #> ..$ BLK : num [1:13] 0 0 0 1 0 2 0 1 0 0 ... #> ..$ TOV : num [1:13] 3 2 1 0 0 2 0 2 0 0 ... #> ..$ PF : num [1:13] 5 4 4 2 1 2 0 5 1 0 ... #> ..$ PTS : num [1:13] 25 22 14 9 2 12 9 6 0 0 ... #> ..$ +/- : num [1:13] -5 -1 6 8 3 -8 4 -14 -10 2 ... #> ..$ Role : chr [1:13] "Starter" "Starter" "Starter" "Starter" ... #> $ visitor_adv_boxscore :'data.frame': 13 obs. of 17 variables: #> ..$ Player: chr [1:13] "Jaylen Brown" "Kyrie Irving" "Jayson Tatum" "Al Horford" ... #> ..$ MP : chr [1:13] "39:36" "39:21" "36:32" "32:07" ... #> ..$ TS% : num [1:13] 0.523 0.615 0.526 0.446 0.5 0.346 0.58 0.798 0 0 ... #> ..$ eFG% : num [1:13] 0.522 0.588 0.458 0.286 0.5 0.313 0.417 1 0 0 ... #> ..$ 3PAr : num [1:13] 0.391 0.529 0.167 0.286 0.5 0.25 0.5 0 0.5 1 ... #> ..$ FTr : num [1:13] 0.087 0.118 0.25 1 0 0.188 0.667 2 0 0 ... #> ..$ ORB% : num [1:13] 2.4 4.9 10.5 0 0 0 0 10.1 0 0 ... #> ..$ DRB% : num [1:13] 13.2 5.3 17.1 22.7 19.9 26.8 16 16.4 0 21.7 ... #> ..$ TRB% : num [1:13] 7.6 5.1 13.7 10.9 9.5 12.8 7.7 13.1 0 10.4 ... #> ..$ AST% : num [1:13] 0 46.5 13.4 22.6 0 14.1 15.8 8.1 0 0 ... #> ..$ STL% : num [1:13] 2.4 3.7 0 0 0 2.8 9.9 0 0 0 ... #> ..$ BLK% : num [1:13] 0 0 0 2.5 0 4.5 0 4.1 0 0 ... #> ..$ TOV% : num [1:13] 11.2 10.1 7 0 0 10.4 0 34.7 0 0 ... #> ..$ USG% : num [1:13] 29.9 22.2 17.3 13.8 16.8 24.3 17.5 13.3 10.2 9.1 ... #> ..$ ORtg : num [1:13] 89 124 115 111 95 70 129 110 0 0 ... #> ..$ DRtg : num [1:13] 104 103 108 105 107 96 87 105 112 107 ... #> ..$ Role : chr [1:13] "Starter" "Starter" "Starter" "Starter" ... #> $ home_basic_boxscore :'data.frame': 14 obs. of 22 variables: #> ..$ Player: chr [1:14] "LeBron James" "Jae Crowder" "Derrick Rose" "Dwyane Wade" ... #> ..$ MP : chr [1:14] "41:12" "34:44" "31:15" "28:30" ... #> ..$ FG : num [1:14] 12 3 5 3 4 4 2 3 2 0 ... #> ..$ FGA : num [1:14] 19 10 14 10 9 7 3 8 3 0 ... #> ..$ FG% : num [1:14] 0.632 0.3 0.357 0.3 0.444 0.571 0.667 0.375 0.667 NA ... #> ..$ 3P : num [1:14] 1 1 1 0 1 1 0 0 0 0 ... #> ..$ 3PA : num [1:14] 5 5 3 1 4 3 0 1 0 0 ... #> ..$ 3P% : num [1:14] 0.2 0.2 0.333 0 0.25 0.333 NA 0 NA NA ... #> ..$ FT : num [1:14] 4 4 3 2 6 1 1 0 0 0 ... #> ..$ FTA : num [1:14] 4 4 4 2 7 1 3 0 0 0 ... #> ..$ FT% : num [1:14] 1 1 0.75 1 0.857 1 0.333 NA NA NA ... #> ..$ ORB : num [1:14] 1 1 1 1 3 0 1 0 1 0 ... #> ..$ DRB : num [1:14] 15 4 3 1 8 4 5 0 1 0 ... #> ..$ TRB : num [1:14] 16 5 4 2 11 4 6 0 2 0 ... #> ..$ AST : num [1:14] 9 2 2 3 0 1 2 0 0 0 ... #> ..$ STL : num [1:14] 0 2 0 0 0 0 0 0 1 0 ... #> ..$ BLK : num [1:14] 2 0 0 2 0 0 0 0 0 0 ... #> ..$ TOV : num [1:14] 4 1 2 4 2 0 2 1 1 0 ... #> ..$ PF : num [1:14] 3 2 2 1 2 4 3 3 3 2 ... #> ..$ PTS : num [1:14] 29 11 14 8 15 10 5 6 4 0 ... #> ..$ +/- : num [1:14] 2 7 -7 0 1 7 2 -2 6 -1 ... #> ..$ Role : chr [1:14] "Starter" "Starter" "Starter" "Starter" ... #> $ home_adv_boxscore :'data.frame': 14 obs. of 17 variables: #> ..$ Player: chr [1:14] "LeBron James" "Jae Crowder" "Derrick Rose" "Dwyane Wade" ... #> ..$ MP : chr [1:14] "41:12" "34:44" "31:15" "28:30" ... #> ..$ TS% : num [1:14] 0.698 0.468 0.444 0.368 0.621 0.672 0.579 0.375 0.667 NA ... #> ..$ eFG% : num [1:14] 0.658 0.35 0.393 0.3 0.5 0.643 0.667 0.375 0.667 NA ... #> ..$ 3PAr : num [1:14] 0.263 0.5 0.214 0.1 0.444 0.429 0 0.125 0 NA ... #> ..$ FTr : num [1:14] 0.211 0.4 0.286 0.2 0.778 0.143 1 0 0 NA ... #> ..$ ORB% : num [1:14] 2.5 3 3.3 3.7 11 0 5.3 0 8.1 0 ... #> ..$ DRB% : num [1:14] 35 11.1 9.2 3.4 27 17.5 24.5 0 7.5 0 ... #> ..$ TRB% : num [1:14] 19.4 7.2 6.4 3.5 19.4 9.1 15.3 0 7.8 0 ... #> ..$ AST% : num [1:14] 43.7 8.2 10.1 15.3 0 7.5 14.8 0 0 0 ... #> ..$ STL% : num [1:14] 0 2.8 0 0 0 0 0 0 3.8 0 ... #> ..$ BLK% : num [1:14] 4.2 0 0 6 0 0 0 0 0 0 ... #> ..$ TOV% : num [1:14] 16.2 7.8 11.3 26.9 14.2 0 31.6 11.1 25 NA ... #> ..$ USG% : num [1:14] 26 15.9 24.6 22.6 21.4 14.7 13.9 27.3 13.5 0 ... #> ..$ ORtg : num [1:14] 126 100 88 67 117 138 92 67 102 0 ... #> ..$ DRtg : num [1:14] 94 98 105 102 100 102 100 107 96 106 ... #> ..$ Role : chr [1:14] "Starter" "Starter" "Starter" "Starter" ... #> $ pdp_df :'data.frame': 114 obs. of 4 variables: #> ..$ time : num [1:114] 16 72 88 99 131 151 166 191 191 238 ... #> ..$ visitor: num [1:114] 2 4 4 4 4 4 6 6 6 8 ... #> ..$ home : num [1:114] 0 0 1 3 5 7 7 8 9 9 ... #> ..$ period : num [1:114] 1 1 1 1 1 1 1 1 1 1 ...
The first 4 elements of the list (visitor_basic_boxscore
, visitor_adv_boxscore
, home_basic_boxscore
, home_basic_boxscore
) are data frames for the basic and advanced box scores of both teams. They are the box scores you see on BasketBallReference.com (with some minor preprocessing):
The last element of the list is a data frame containing play-by-play information, much like that in the previous post.
Hopefully some of you will take this data and make some cool data visualizations from it! Below, I will walk through how I use this dataset for two visualizations.
Do Kevin Durant and Steph Curry get in each other’s way?
Kevin Durant and Steph Curry are two of the most lethal scorers in today’s game. Does both of them being on the same team hamper the other’s production? This is pretty difficult to pin down what we mean by this statement precisely; we will instead make do with a couple of suggestive plots.
The first plot we can make is a scatterplot of the points scored by these two players. If the relationship is positively correlated, it could mean that they make each other better.
Let’s pull out the game_id
s for Golden State:
team_name <- "Golden State Warriors" player1 <- "Stephen Curry" player2 <- "Kevin Durant" # get game_ids game_rows <- which(game_df$visitor_team_name == team_name | game_df$home_team_name == team_name) game_ids <- game_df[game_rows, "game_id"]
Next, we set up a data frame consisting of 4 columns: the game_id
, game_type
(regular or playoff), and points scored by players 1 and 2 respectively.
points_df <- data.frame(matrix(NA, ncol = 4, nrow = length(game_ids))) names(points_df) <- c("game_id", "game_type", player1, player2)
The loop below builds up the data frame row by row (not the most R-like way of programming but it gets the job done…):
for (i in 1:nrow(points_df)) { id <- game_ids[i] points_df[i, 1] <- id points_df[i, 2] <- game_df[game_rows[i], "game_type"] # get the correct basic box score if (game_df[game_rows[i], "visitor_team_name"] == team_name) { boxscore <- master_list[[id]]$visitor_basic_boxscore } else { boxscore <- master_list[[id]]$home_basic_boxscore } # get player points if (player1 %in% boxscore$Player) { points_df[i, 3] <- subset(boxscore, Player == player1)$PTS } if (player2 %in% boxscore$Player) { points_df[i, 4] <- subset(boxscore, Player == player2)$PTS } } head(points_df) #> game_id game_type Stephen Curry Kevin Durant #> 1 201710170GSW Regular 22 20 #> 2 201710200NOP Regular 28 22 #> 3 201710210MEM Regular 37 29 #> 4 201710230DAL Regular 29 25 #> 5 201710250GSW Regular 30 29 #> 6 201710270GSW Regular 20 31
Next, we make the scatterplot (notice how we can use the get
function to refer to the columns programmatically). We force the axes to start at 0 and the aspect ratio to be 1, and include the 45 degree line for reference.
ggplot(data = points_df, aes(x = get(player1), y = get(player2))) + geom_point(alpha = 0.5) + geom_smooth(se = FALSE) + geom_abline(slope = 1, intercept = 0, lwd = 0.5, col = "red", lty = 2) + scale_x_continuous(limits = c(0, NA)) + scale_y_continuous(limits = c(0, NA)) + labs(x = player1, y = player2, title = "Points scored") + coord_fixed()
One might say, wouldn’t it be better to compare games where KD and Curry play, vs. games where only KD plays? If KD scores more if he is playing with Curry, that could suggest that playing together makes him more productive. To make these plots, we introduce two new columns which are Boolean variables representing whether a given player played in that game or not. We then make boxplots to compare the points scored depending on whether the other player played.
# ignore games where both didn't play points_df2 <- points_df[!(is.na(points_df[[3]]) & is.na(points_df[[4]])), ] points_df2[[paste(player1, "played")]] <- !is.na(points_df2[[3]]) points_df2[[paste(player2, "played")]] <- !is.na(points_df2[[4]]) # boxplots ggplot(data = points_df2, aes(x = get(paste(player1, "played")), y = get(player2))) + geom_boxplot() + geom_jitter(height = 0, alpha = 0.5, col = "blue") + labs(x = paste(player1, "played?"), y = "Points scored", title = paste("Points", player2, "scored")) ggplot(data = points_df2, aes(x = get(paste(player2, "played")), y = get(player1))) + geom_boxplot() + geom_jitter(height = 0, alpha = 0.5, col = "blue") + labs(x = paste(player2, "played?"), y = "Points scored", title = paste("Points", player1, "scored"))
From the plots, it looks like each player scores a bit more when the other person is not playing, although there is quite a bit of variance.
Instead of looking at points scored, we could look at more sophisticated measures of production. For example, the NBA computes individual offensive and defensive ratings for each game based on a complicated formula (see here for details). These measures are provided as ORtg
and DRtg
in the advanced box score.
The code below pulls out the ORtg and DRtg values for KD and Curry:
points_df <- data.frame(matrix(NA, ncol = 6, nrow = length(game_ids))) names(points_df) <- c("game_id", "game_type", sapply(c(player1, player2), function(x) paste(x, "ORtg")), sapply(c(player1, player2), function(x) paste(x, "DRtg"))) for (i in 1:nrow(points_df)) { id <- game_ids[i] points_df[i, 1] <- id points_df[i, 2] <- game_df[game_rows[i], "game_type"] # get the correct advanced box score if (game_df[game_rows[i], "visitor_team_name"] == team_name) { boxscore <- master_list[[id]]$visitor_adv_boxscore } else { boxscore <- master_list[[id]]$home_adv_boxscore } # get player ORtg & DRtg if (player1 %in% boxscore$Player) { points_df[i, 3] <- subset(boxscore, Player == player1)$ORtg points_df[i, 5] <- subset(boxscore, Player == player1)$DRtg } if (player2 %in% boxscore$Player) { points_df[i, 4] <- subset(boxscore, Player == player2)$ORtg points_df[i, 6] <- subset(boxscore, Player == player2)$DRtg } }
And here are the plots (along with a linear fit):
This was pretty interesting to me: it looks like the ORtgs are independent of each other while the DRtgs are highly correlated with each other. Might this suggest that defense is more of a team activity, while offense can be driven by individuals?
There are of course many other visualizations we could do to explore this question. We might also want to see how the KD-Curry pairing compares with other pairings. Because of the way the code is written, it’s really easy to generate these figures for other pairings: we just need to update the team_name
, player1
and player2
variables and rerun the code. As an example, here are the ORtg and DRtg scatterplots for Damian Lillard and C.J. McCollum of the Portland Trail Blazers:
We see similar trends here.
Are the Golden State Warriors really better in the 3rd quarter?
Last season, it seemed like the Warriors were often down at the half, only to come roaring back in the third quarter. Is this really their best quarter? (This NYT article says yes and suggests why that might be the case.) To answer this question, we’ll make a lead tracker graphic (much like the previous post) but (i) we will do it for all games GSW played in, and (ii) we reset the score differential to 0 at the beginning of each period.
We first set up a data frame that records the game_id
, whether the team was at home, and whether they won (we might want to use latter two variables in our plotting later):
team_name <- "Golden State Warriors" game_rows <- which(game_df$visitor_team_name == team_name | game_df$home_team_name == team_name) game_ids <- game_df[game_rows, "game_id"] # set up game data frame and play-by-play list df <- data.frame(matrix(NA, ncol = 3, nrow = length(game_ids))) names(df) <- c("game_id", "home", "win") df$game_id <- game_ids df$home <- game_df[game_rows, "home_team_name"] == team_name df$win <- (df$home & game_df[game_rows, "home_pts"] > game_df[game_rows, "visitor_pts"]) | ((!df$home) & game_df[game_rows, "home_pts"] < game_df[game_rows, "visitor_pts"])
Next, we create a list with keys being the game_id
s and the value being the play-by-play data frame from the master list we scraped earlier. We standardize the column names so that GSW’s score is labeled team
and the opponent’s score is labeled opp
. (Note: At the time of writing, BasketBallReference.com had incorrect data for the game 201710200NOP
, so I have removed that game from our data.)
pbp_list <- list() for (i in 1:nrow(df)) { id <- game_ids[i] pdp <- master_list[[id]]$pdp_df if (df[i, "home"]) { names(pdp) <- c("time", "opp", "team", "period") } else { names(pdp) <- c("time", "team", "opp", "period") } pbp_list[[id]] <- pdp } # NOTE: mistake in 201710200NOP. original data on website is wrong # https://www.basketball-reference.com/boxscores/pbp/201710200NOP.html # so we remove it df <- df[-which(df$game_id == "201710200NOP"), ] pbp_list[["201710200NOP"]] <- NULL
We create a function parse_pbp
which helps us reset the score at the beginning of every quarter. To make the lines in our eventual visualization smoother, parse_pbp
also adds additional rows for the beginning and end of every quarter.
parse_pbp <- function(pbp) { pbp <- rbind(0, pbp) new_pbp <- pbp # get points scored in the period last_opp <- 0; last_team <- 0; last_period <- 0 for (i in 2:nrow(pbp)) { if (pbp[i, "period"] > last_period + 1) { last_period <- last_period + 1 last_opp <- pbp[i-1, "opp"] last_team <- pbp[i-1, "team"] } new_pbp$opp[i] <- pbp$opp[i] - last_opp new_pbp$team[i] <- pbp$team[i] - last_team } # add extra rows to denote beginning and end of periods num_period <- max(new_pbp$period) for (i in 1:num_period) { end_row <- new_pbp[max(which(new_pbp$period == i)), ] end_row[1] <- 12 * 60 * min(i, 4) + 5 * 60 * max(i-4, 0) beg_row <- c(0, 0, 0, i) beg_row[1] <- 12 * 60 * min(i-1, 4) + 5 * 60 * max(i-1-4, 0) new_pbp <- rbind(new_pbp, beg_row) new_pbp <- rbind(new_pbp, end_row) } new_pbp <- new_pbp[order(new_pbp$time), ] new_pbp$adv <- with(new_pbp, team - opp) new_pbp[-1, ] } pbp_list <- lapply(pbp_list, parse_pbp)
Finally, we make the list into a dataframe:
pbp_df <- data.frame(matrix(ncol = 6, nrow = 0)) names(pbp_df) <- c("game_id", "home", "win", "time", "period", "adv") for (i in seq_along(pbp_list)) { xx <- pbp_list[[i]] xx$game_id <- df$game_id[i] xx$home <- df$home[i] xx$win <- df$win[i] pbp_df <- rbind(pbp_df, xx) } head(pbp_df) #> time opp team period adv game_id home win #> 124 0 0 0 1 0 201710170GSW TRUE FALSE #> 2 13 2 0 1 -2 201710170GSW TRUE FALSE #> 3 55 2 1 1 -1 201710170GSW TRUE FALSE #> 4 55 2 2 1 0 201710170GSW TRUE FALSE #> 5 90 2 5 1 3 201710170GSW TRUE FALSE #> 6 104 5 5 1 0 201710170GSW TRUE FALSE
We can now make the line plot easily. We use the interaction
function to group the line plots by both game_id
and quarter
, color the lines by whether GSW won or not, and manually define the breaks to be at the end of the periods:
periods <- unique(pbp_df$period) x_value <- ifelse(periods <= 4, 12 * 60 * periods, 12 * 60 * 4 + 5 * 60 * (periods - 4)) x_label <- ifelse(periods <= 4, paste0("Q", periods), paste0("OT", periods - 4)) ggplot(pbp_df, aes(x = time, y = adv)) + geom_line(aes(col = win, group = interaction(game_id, period)), lwd = 0.1) + geom_smooth(aes(group = period), col = "black", se = FALSE) + scale_x_continuous(breaks = x_value, labels = x_label) + scale_color_manual(values = c("#ff6600", "#3366ff")) + labs(title = paste("Point Advantage by Quarter,", team_name)) + theme_minimal() + theme(plot.title = element_text(face = "bold", hjust = 0.5), axis.title.x = element_blank(), axis.title.y = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), legend.position = "bottom")
It may not be so obvious in this plot, but there certainly seems to be a bit of a difference in the third quarter. On average, the Warriors are even or up 1-2 points, but in the third quarter GSW gains almost 5 points. The difference is more obvious in the plot below, where we zoom in on the y-axis. (Note that we have to use coord_cartesian
instead of scale_y_continuous
to zoom in: scale_y_continuous
will remove the points outside the range before plotting, which results in the incorrect result for the geom_smooth
layer.
ggplot(pbp_df, aes(x = time, y = adv)) + geom_line(aes(col = win, group = interaction(game_id, period)), lwd = 0.1) + geom_smooth(aes(group = period), col = "black", se = FALSE) + scale_x_continuous(breaks = x_value, labels = x_label) + scale_color_manual(values = c("#ff6600", "#3366ff")) + coord_cartesian(ylim = c(-10, 10)) + labs(title = paste("Point Advantage by Quarter,", team_name)) + theme_minimal() + theme(plot.title = element_text(face = "bold", hjust = 0.5), axis.title.x = element_blank(), axis.title.y = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.minor.y = element_blank(), legend.position = "bottom")
The advantage becomes even more obvious when we facet by whether GSW won or not: in wins, they gain almost 10 points in the 3rd quarter, compared to something like 1-3 points in each of the other quarters.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.