Running Around: 2022 running dataviz in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
2022 was my best year for running to date. In 2021, my goal was to run 2021 km. For 2022, I wanted to see if I could run 2500 km and also to run 50 HM-or-more distance runs. I managed both and ended the year on a total of 2734 km. I also bagged two PBs for half marathon.
Of course, if you subscribe to Strava or VeloViewer or whatever, you can get a nice data visualisation of your year in running. But where’s the fun in that when we can do that (and so much more) in R?
Reaching the goal
I used my previous scripts to track my progress against the goal. At some point in July, I upped my weekly kms and went ahead of my goal. I hit 2500 km at the end of November.
data:image/s3,"s3://crabby-images/fd083/fd083928fa874208ab9222e566431cfe3a377add" alt=""
data:image/s3,"s3://crabby-images/5ce15/5ce1547db77b8ebc44abb6f95e1ab83f8be61f1a" alt=""
data:image/s3,"s3://crabby-images/86b76/86b76dd15377cf8e1e2887313ec3665d247de4a8" alt=""
How did I do it? Well, here are all my runs:
data:image/s3,"s3://crabby-images/9471e/9471e36856c56e15e14e8a6cf07c516023629b55" alt=""
I generated this visualisation using Marcus Volz’s Strava package. Details of how I used it are here. Briefly, I have a local store of gpx files for all my activities and these can be loaded into R and visualised with Strava
.
We are looking at all the courses I ran in 2022 (in order). They are shown to scale. You can see that I did lots of short runs (run commutes) and a smaller number of longer ones.
Let’s look at that in more detail. A treemap view works well here:
data:image/s3,"s3://crabby-images/5af55/5af558cbbfc27f9036174afe8b1ebfcbfd971b43" alt=""
From this breakdown we can see that about half the total distance came from short runs, <10 km. In fact, 4 km and 5 km runs dominate. These are my run commutes, which is a distance between 4.4 and 5.5 km, depending on the route. About a quarter of the total distance came from runs in the region of 21-25 km and the remainder from 10-24 and 25+ distances. I didn’t do any runs between 15-20 km because anything in that range would be bumped up to HM distance to meet my goal.
The 2734 km came from 332 runs, but how did I fit them in? And how did I get some rest?
A calendar view is nice here. We can look either at the number of runs per day or the kms run per day. I did “the church of the long run” i.e. a single, long run on Sunday; and as we have seen, run-commutes which are typically two runs of shorter distance. I mostly did not run on Saturdays. At the start of the year I didn’t run on Mondays and Thursdays either, but that was less of a rule after the summer.
data:image/s3,"s3://crabby-images/912a2/912a20f7c52bdacbf67a2d83fa59f20f7ba090a8" alt=""
The progress reports, treemap and calendar view were all created using data downloaded from Garmin Connect. To load the data and process it for the reports see here; for the treemap see here; for the calendar view I am using a function that I described here. The code to generate the plots above is:
library(lubridate) library(ggplot2) library(dplyr) library(timetk) library(patchwork) # we start with all_data which is loaded from the previous code using # process_data("running","2022-01-01","2022-12-31",2500) # summarise the running data by day df_day <- all_data df_day$Date <- as.Date(all_data$Date) df_day <- df_day %>% summarize_by_time(.date_var = Date, .by = "day", Distance = sum(Distance), n = n()) # first plot p1 <- calendarHeatmap(df_day$Date, df_day$n, title = "Running 2022", subtitle = "Runs per day") # second plot p2 <- calendarHeatmap(df_day$Date, df_day$Distance, title = "", subtitle = "km per day") # assemble with patchwork p <- p1 / p2 ggsave("Output/Plots/calendar_per_day.png", p)
What about the 50 HM-or-greater goal?
Well, we can also array these longer runs out to look at them in all their glory.
data:image/s3,"s3://crabby-images/9449a/9449a443716b5d2057f33b5a200f26e3f29d88a7" alt=""
I tried to run different routes for these long runs. In 2021, I had a sub-goal of running 30 HM-or-more courses. That year, I set myself the additional criterion that each HM-or-more course must be different. I relaxed that this year and ran a couple of courses 3 or 4 times. I still managed to vary them quite a bit though.
New half-marathon PBs
I managed to improve on my HM time twice this year. My previous best was set in 2018 and since that time I ran two slower HMs, which was annoying. I changed a few things and managed to improve my time this year in March and again in September. In the summer, I ran a HM which was faster than my 2018 best but did not improve on my March 2022 time. My September 2022 PB was gratifying as I was wondering if I would ever go under 95 min for HM… especially as I am not getting any younger.
data:image/s3,"s3://crabby-images/42e2a/42e2a5fbdedbb8f491e4688e2ccbbb0e3dec0643" alt=""
Generating this plot was a bit tricky. If someone knows a better way, let me know!
# process_load() is a function used in TSS analysis # load the data we want to look at mydata <- process_load("Running","2016-01-01","2022-12-31") # drop the 1st column and remove duplicates mydata$Activity.Type <- NULL mydata <- distinct(mydata) # Time is a character vector, change to POSIXct mydata$Time <- as.POSIXct(strptime(mydata$Time, format = "%H:%M:%S")) # filter for HM distance runs in this period df_hm <- mydata %>% filter(Distance > 20.9 & Distance < 21.5) # the following code generates a second data frame to visualise PBs record <- data.frame(Date = df_hm$Date[1], Time = df_hm$Time[1]) minTime <- record$Time[1] for (i in 2:nrow(df_hm)) { if(df_hm$Time[i] < minTime) { recordA <- data.frame(Date = df_hm$Date[i], Time = minTime) minTime <- df_hm$Time[i] recordB <- data.frame(Date = df_hm$Date[i], Time = minTime) record <- rbind(record,recordA,recordB) } if(i == nrow(df_hm)) { recordB <- data.frame(Date = df_hm$Date[i], Time = minTime) record <- rbind(record,recordB) } } # generate the plot # I cut the data using 01:41:00 to differentiate between HM events and other runs, i.e. it is a hack. p <- ggplot() + geom_point(data = df_hm, aes(x = Date, y = Time, colour = cut(Time, as.POSIXct(strptime(c("01:30:00","01:41:00","02:00:00"), format = "%H:%M:%S"))))) + geom_line(data = record, aes(x = Date, y = Time), linetype = 2) + lims(y = as.POSIXct(strptime(c("01:30:00","02:00:00"), format = "%H:%M:%S"))) + theme_bw() + theme(legend.position = "none") ggsave("Output/Plots/hm_pb.png", p)
Form throughout the year
It is worth tracking running form to avoid injury. I did this using TSS (described here). This is the graphic for the whole year.
data:image/s3,"s3://crabby-images/8a819/8a81946c77a3b5b9135ca3796f944f755a4dd679" alt=""
I find this view of fatigue and fitness very useful and will track this again in 2023. I spend way too much time in the grey zone and I think I can improve my HM time if I get more focused with my training.
Conclusion
Beyond self-congratulation, the conclusion of this post is that R is very useful to analyse running data, to track progress and to visualise running achievements. Sure, this can all be done automatically by a third party app, but if you maintain your own running log offline or if you just want more control over the analysis, R is fantastic for generating similar (or better) visualisations, which can be bespoke and tailored for what you want to know.
—
The post title is taken from “Running Around” by D.R.I. I have several copies of this song, but since I am music snob, I will say it is a rip from Violent Pacification 7″ EP released in 1984. Amazingly I haven’t used this song title on quantixed yet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.