Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Picking up the pieces
Recently, I’ve animated a specific running goal for one year with data from the Strava API. I thought it would also be nice to compare different years within the same animation – in a way you could say that you’re racing yourself over several years.
This only needs some minor modifications to the code I posted before. Data collection stays basically the same, so I’m just re-posting this here.
suppressPackageStartupMessages(library(rStrava)) suppressPackageStartupMessages(library(lubridate)) suppressPackageStartupMessages(library(gganimate)) suppressPackageStartupMessages(library(scales)) # Get the data from the API app_name <- '<your app name>' app_client_id <- '<your app client ID>' app_secret <- '<your app secret>' # create the authentication token stoken <- httr::config(token = strava_oauth(app_name, app_client_id, app_secret, app_scope="activity:read_all")) act.list <- get_activity_list(stoken) act.df <- compile_activities(act.list) act.df <- act.df[,c("start_date_local", "distance", "type")]
Alright, we’re all set for the modifications to the code now. Remember that you can also plug in your own dataframe as long as it fits the structure of the API-generated one.
Code modifications
Let’s say we still want to run 500 km in one year, but now we want to compare several years – let’s choose the last 3 years:
goal <- 500 sports <- "Run" selected.year <- c(2018, 2019, 2020)
We are subsetting and aggregating the data. Just as a reminder: We have to aggregate in case we have several activities at one day (like one run split in two activities).
select.df <- act.df[act.df$type %in% sports & year(act.df$start_date_local) %in% selected.year,] select.df$date <- as.Date(date(select.df$start_date_local)) select.df <- aggregate(distance ~ date, data = select.df, FUN = sum) select.df <- select.df[order(select.df$date),]
The dataframe with every date (not only the ones with recorded activities) is still called year.df
. We have to make sure that this is done for each year in the dataset. I’m using an sapply()
for this.
year.df <- data.frame( date = unlist(sapply(unique(year(select.df$date)), USE.NAMES = F, FUN = function (yr) { as.character(seq(as.Date(paste0(yr, "-01-01")), as.Date(paste0(yr, "-12-31")), by = "day")) })) ) year.df$date <- as.Date(year.df$date)
Now, we are
- integrating the data from
select.df
intoyear.df
with a call tomerge()
- ordering by date,
- replacing
NA
s with zeros in thedistance
column - creating a column with the day of year (with the
yday()
function from {lubridate}) because this will be our new x-axis variable. We have to do this because we want to see all years in the dataset develop “next to each other”. The day of year is the same for all years (with a minor exception: in leap years, days will be shifted by 1 starting from March 1st – this won’t be visible in the plot though) - creating a column with the year which we will use later for the
col
andgroup
aesthetic of the plot.
year.df <- merge(year.df, select.df, by = "date", all.x = T, all.y = F) year.df <- year.df[order(year.df$date),] year.df$distance <- ifelse(is.na(year.df$distance), 0, year.df$distance) year.df$day <- yday(year.df$date) year.df$year <- year(year.df$date)
The next call makes use of some {dplyr} functions and piping. Although I have been sceptical towards the “tidyverse” in the beginning, I am now using some functionality whenever it has huge advantages compared to base R functionality. The following code chunk shows such an advantage, I think.
- We’re grouping
year.df
by year. - We are cumulating the distances within each group (because cumulated distance has to start with 0 at the beginning of the year)
- We are ungrouping and converting the tibble back to dataframe (because otherwise it would break later if we would keep the tibble).
suppressPackageStartupMessages(library(dplyr)) year.df %>% group_by(year) %>% mutate(cum.distance = round(cumsum(distance))) %>% ungroup() %>% as.data.frame() -> year.df
Now, we are finally computing all the other statistics used in the plot (see the previous post for further explanations). I am keeping all the statistics although currently not all of them are being used. I got rid of the large label in the lower right corner because I don’t have a good idea to still keep it simple and short with 3 years.
year.df$remaining.distance <- goal - year.df$cum.distance year.df$days.till.end <- as.numeric(as.Date(paste0(year.df$year, "-12-31")) - year.df$date) year.df$dist.per.day.to.goal <- year.df$remaining.distance / year.df$days.till.end year.df$dist.per.week.to.goal <- year.df$dist.per.day.to.goal * 7 year.df$dist.per.day.to.goal <- round(year.df$dist.per.day.to.goal, 1) year.df$dist.per.week.to.goal <- round(year.df$dist.per.week.to.goal, 1) year.df[year.df$dist.per.day.to.goal < 0, "dist.per.day.to.goal"] <- 0 year.df[year.df$dist.per.week.to.goal < 0, "dist.per.week.to.goal"] <- 0 year.df[year.df$remaining.distance < 0, "remaining.distance"] <- 0 year.df[year.df$days.till.end == 0, "dist.per.day.to.goal"] <- "-" year.df[year.df$days.till.end == 0, "dist.per.week.to.goal"] <- "-" current.day <- yday(Sys.Date())
Plotting and animating
The only important change here is that the x-axis is defined by the day of the year now. I am including major and minor breaks at the beginning of the months. The x-axis labels are given for every second month.
p <- ggplot(year.df, aes(x = day, y = cum.distance, col = factor(year), group = year)) + geom_line() + geom_hline(yintercept = goal, col = "red", lwd = 1) + geom_vline(xintercept = current.day, col = alpha("grey", .5), lty = "dashed") + geom_segment(x = 1, y = 0, xend = 365, yend = goal, lty = "dashed", col = "red") + geom_segment(aes(x = day, y = cum.distance, col = factor(year), group = year), xend = 365, yend = goal, lty = "dotted") + geom_point(size = 2) + geom_text(aes(day + 7, label = paste0(year, ": ", round(cum.distance))), hjust = 0, size = 4) + coord_cartesian(clip = 'off') + scale_x_continuous(breaks = c(32, 91, 152, 213, 274, 335), labels = c("Feb", "Apr", "Jun", "Aug", "Oct", "Dec"), minor_breaks = c(1, 60, 121, 182, 244, 305, 365)) + transition_reveal(day) + guides(col = F) + labs(x = "Day of year", y = "km completed") + theme_minimal() + theme(plot.margin = margin(5.5, 60, 5.5, 5.5))
Aaaaand… animating the whole thing!
animate(p, fps = 60, duration = 30, width = 1440, height = 900, res = 200, end_pause = 25, renderer = av_renderer(file = "runs-3yr.mp4"))< video controls preload="none" src="https://wolferpi.ddns.net/runs-3yr.mp4" width="450"> You can download the video here: https://wolferpi.ddns.net/runs-3yr.mp4
Now, if you let this play out, there are several interesting things to note:
- I’ve started quite strongly in 2018. However, in mid-May I’ve injured my knee during a run in the forest. I had to take a long break. Also, I wasn’t able to run as much till the end of the year, even after starting again around August.
- This is why I’ve performed better in 2020 overall. It simply was a more “steady” year.
- In 2019, I’ve been really lazy – overall and especially in the looong “break” from mid-September till the end of the year.
Have fun, keep on running and don’t be like my second half of 2019 🙂
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.