Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Setup
Loading the R
libraries and data set.
# Loading libraries library(tidyverse) # Reading in the raw data from GitHub (I would use "tt_load", but I hit an API # rate limt) games <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-03-16/games.csv')
Plotting Peak vs. Average number of players using 200 observations
For this plot, the top 200 observations for average number of players at the same time are selected using slice_max(order_by = avg, n = 200)
. The peak and average number of players for these observations are plotted on a scatter plot. The colour of the points indicates the game used for each observation. Models are fit to illustrate trends in the data; these trends follow the featured three games. The game “Cyberpunk 2077” was filtered out before creating this plot, as it occurred in only one of the 200 observations.
games %>% select(gamename, avg, peak) %>% # Filtering out Cyberpunk 2077 as it has only a single observation filter(gamename != "Cyberpunk 2077") %>% slice_max(order_by = avg, n = 200) %>% ggplot(aes(x = avg, y = peak, colour = gamename)) + geom_point() + geom_smooth(formula = y ~ x) + theme_bw() + theme(legend.position = "bottom") + labs(title = "Peak vs. Average number of players online simultaneously", subtitle = "Top 200 observations for Average used", y = "Highest number of players at the same time", x = "Average number of players at the same time", colour = "Game")
Combining “year” and “month” into a new variable
Combining the year
and month
variables makes it easier to track when each observation was recorded. Creating this new year_month
variable using the as.date()
function from {lubridate}
ensures that it will be interpreted as a date.
# Creating a "year_month" variable with the year and month of each observation # using the lubridate "as_date()" function, by pasting together... games$year_month <- lubridate::as_date(paste( # ...the "year" variable... games$year, # ...the number of each month, obtained by matching the month names to the # "month.name" built-in constant... match(games$month, month.name), # ...the number "1", as a dummy "day" value... 1, # ...separated by a "-". sep = "-")) # Printing the start of the "games" object with the new variable... games # A tibble: 83,631 x 8 gamename year month avg gain peak avg_peak_perc year_month <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr> <date> 1 Counter… 2021 Febr… 7.41e5 -2196. 1.12e6 65.9567% 2021-02-01 2 Dota 2 2021 Febr… 4.05e5 -27840. 6.52e5 62.1275% 2021-02-01 3 PLAYERU… 2021 Febr… 1.99e5 -2290. 4.47e5 44.4707% 2021-02-01 4 Apex Le… 2021 Febr… 1.21e5 49216. 1.97e5 61.4752% 2021-02-01 5 Rust 2021 Febr… 1.18e5 -24375. 2.24e5 52.4988% 2021-02-01 6 Team Fo… 2021 Febr… 1.01e5 18083. 1.34e5 75.7603% 2021-02-01 7 Grand T… 2021 Febr… 9.06e4 -10603. 1.46e5 61.9017% 2021-02-01 8 Tom Cla… 2021 Febr… 7.24e4 -5335. 1.13e5 63.8645% 2021-02-01 9 Rocket … 2021 Febr… 5.37e4 -5726. 1.03e5 51.9419% 2021-02-01 10 Path of… 2021 Febr… 4.69e4 -766. 9.05e4 51.8229% 2021-02-01 # … with 83,621 more rows
Plotting the monthly player gains and losses for three of the most popular games
This plot uses the year_month
variable on the x-axis and the gain
variable on the y-axis to illustrate month-to-month gains and losses in average players. The three games used share the majority of the highest avg
(average number of simultaenous players) values in the data set. This graph is faceted for each game. To put these gains and losses into perspective, dashed lines are added at plus and minus 100,000 players in each facet.
games %>% # Selecting the variables select(gamename, year_month, gain) %>% # Filtering the data set for three of the most popular games filter(gamename == "Dota 2" | gamename == "PLAYERUNKNOWN'S BATTLEGROUNDS" | gamename == "Counter-Strike: Global Offensive") %>% ggplot(aes(year_month, gain, fill = gamename)) + geom_col() + theme_classic() + theme(legend.position = "none") + # Adding dashed lines to put facets into perspective geom_hline(yintercept = 100000, linetype = "dashed") + geom_hline(yintercept = -100000, linetype = "dashed") + # Faceting the plot for each game facet_wrap(~gamename, scales = "free") + labs( title = "Gains/Losses in average number of players online for three games", subtitle = "Dashed lines added at +/-100,000 players for each game", x = "Time", y = "Gains/Losses in average number of players" )
Discussion
The two plots in this post illustrate peak number of players online, average number of players online, and changes in that average for three games. Of these games, PLAYERUNKNOWN’S BATTLEGROUNDS (PUBG) has the highest values for peak and average players by far. However, these high points were not sustained, with dramatic losses in average number of players per month in 2018. By contrast, Dota 2 has maintained a relatively steady player base, without dramatic gains or losses. Counter-Strike: Global Offensive’s spike in popularity around April 2020 coincides with the introduction of lockdown measures.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.