Site icon R-bloggers

Analyzing every minutes of my spare time in R: 6 months of time tracking insights

[This article was first published on Aster Hu's Blog | Asteroid, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve always believed that how I spend my spare time will have an impact on my future. Based on this philosophy, I started tracking my spare time, which includes any time outside of sleep hours and my 9-5 work schedule. I now have over 6 months’ worth of data, and here are my findings I’ve done with R.

< section id="time-tracking-category" class="level2">

Time tracking category

To get started, I came up with 6 main categories.

< section id="data-preparation" class="level2">

Data preparation

My data is partially synced to Google Calendar (see Apps I used for time tracking for more details), with each event representing an activity. I exported all calendars as .ics files, converted them to csv utilizing the ics2csv library, and then used bind_rows to combine them with my other csv files.

Since I don’t publish the data, I won’t delve into all the details of data cleaning and aggregation, as they aren’t reproducible. However, here are a few things I considered:

After the preparation, I have two data frame: hrs_weekday and hrs_weekend, which contain the breakdown of each category on weekdays and weekends, respectively. The structure of data frames looks like below.

dtdate Distraction Essentials Sanity Wellness PersDev ProfDev
2023-02-10 0.6322222 6.285278 1.5500000 0.7836111 NA NA
2023-02-13 0.6616667 5.285278 0.5566667 NA 2.3925000 NA
2023-02-14 1.0436111 6.801389 0.6250000 0.6636111 0.7350000 NA
2023-02-15 0.6611111 3.718056 1.2347222 NA 2.8708333 NA
2023-02-16 1.9377778 5.551667 0.5000000 1.9127778 0.7752778 NA
2023-02-17 1.3572222 3.295278 1.2613889 0.4233333 1.7458333 NA

Each column is a category, and each row represents the daily sum of the corresponding category on a given day.

In addition, I created a constant midpoint, which is the midpoint of all date. This will help formatting the labels later when plotting time series graphs.

< section id="where-did-my-time-go" class="level2 page-columns page-full">

Where did my time go?

After excluding sleep and work hours, my activity breakdown looked like this in a pie chart.

< details class="code-fold"> < summary>Click to show the code
# Import library ----
library(tidyverse)
library(ggplot2)
library(ggTimeSeries)
library(ggrepel)
library(grid)
library(gridExtra)

# Define colour palette ----

custom_colors <- c("Essentials" = "#F2E1C9", "PersDev" = "#ADD4CD", "ProfDev" = "#b6dcf5", "Distraction" = "#FDBAA0", "Sanity" = "#D1D9EA", "Wellness" = "#DBEFBA")

# By activity ----

## Prep ----

# Reshape the data and calculate total/avg/percentage spent of each category
get_main_cat <- function(df) {
  df %>%
    pivot_longer(!dtdate, names_to = "main_cat", values_to = "daily_spent") %>%
    group_by(main_cat) %>%
    summarize(
      total_spent = sum(daily_spent, na.rm = TRUE),
      avg_spent = total_spent / n_distinct(.$dtdate)
    ) %>%
    mutate(
      pct_spent = total_spent / sum(total_spent),
      # Get the positions for plotting pie chart
      # Source: https://r-charts.com/part-whole/pie-chart-labels-outside-ggplot2/
      csum = rev(cumsum(rev(avg_spent))),
      pos = avg_spent / 2 + lead(csum, 1),
      pos = if_else(is.na(pos), avg_spent / 2, pos)
    )
}

cat_weekday <- get_main_cat(hrs_weekday)
cat_weekend <- get_main_cat(hrs_weekend)

## Plot category ----

# Plot pie chart
plot_main_cat <- function(df, p_title) {
  ggplot(df, aes(x = "", y = avg_spent, fill = main_cat)) +
    # Plot the pie of main_cat
    geom_bar(stat = "identity", width = 1, color = "white") +
    coord_polar(theta = "y", start = 0) +
    # Add daily avg label for pie chart
    geom_label_repel(
      aes(y = pos, label = paste(main_cat, "\n", round(avg_spent, 2), "h")),
      colour = "black",
      seed = 0,
      force = 0.6,
      segment.colour = NA,
      show.legend = NA
    ) +
    # Add percentage
    geom_text(aes(x = 1.6, y = pos, label = paste0(round(pct_spent * 100), "%")),
      face = "bold"
    ) +
    theme_void() +
    scale_fill_manual(values = custom_colors) +
    labs(
      subtitle = p_title,
      caption = paste("Total =", round(sum(df$avg_spent), 1), "hours"),
    ) +
    theme(
      plot.subtitle = element_text(hjust = 0.5, vjust = -3),
      legend.position = "none",
      plot.caption = element_text(hjust = 0, vjust = 30, size = 11) # Set caption to left align
    )
}

p_cat_weekday <- plot_main_cat(cat_weekday, "Weekday")
p_cat_weekend <- plot_main_cat(cat_weekend, "Weekend")

title <- textGrob("How I spent my spare time", gp = gpar(size = 16,  = 2)) %>%
  arrangeGrob(zeroGrob(),
    widths = unit(0, "npc"),
    heights = unit(c(1, 0), c("cm", "npc")),
    as.table = FALSE
  )

cap <- textGrob("*Exclude sleep and work hours", gp = gpar(size = 10,  = 3)) %>%
  arrangeGrob(zeroGrob(),
    widths = unit(0, "npc"),
    heights = unit(c(0, 10), c("cm", "npc")),
    as.table = FALSE
  )
p_cat <- grid.arrange(p_cat_weekday, p_cat_weekend,
  ncol = 2,
  top = title, bottom = cap
)

How I spent my spare time by activity category

On an average weekday, I have a total of 10.1 hours available after excluding Sleep and Work hours. On the weekend, I have 14.4 hours.

The biggest category is Essentials. I spent an average of ~5 hours each day on life essential activities, which may seem like a lot, but it really isn’t, considering non-routine chores such as grocery shopping.

Sanity (2-3 hours) also took up a significant portion on both weekdays and weekends. These activities are, in a way, also essential to me because they are necessary to keep my sanity in check.

Personal development is the one that shows the most variance; I spent 1.77 hours more on it during the weekend. This makes sense, since I have more free time to invest in my hobbies on the weekends.

I’m glad to find out that Distraction wasn’t as bad as I thought. I spent roughly 40 minutes on mindless scrolling on the weekday, and a bit more on the weekend.

The pie chart is probably the most tricky one, because it has two sets of labels — daily average and percentage — and I need to ensure that both labels align with the corresponding portions. Due to this reason, it’s necessary to get the position before creating the graph and applying them to y in aes(). After plotting the pie chart for weekdays and weekends, I use gridExtra::grid.arrange() to combine two pies, and grid::textGrob() to format the title and captions.

< section id="how-much-time-is-necessary-to-keep-life-going" class="level2 page-columns page-full">

How much time is necessary to keep life going?

By my definition, Essentials include both activities to keep the body alive, as well as house chores. These are the times I cannot cut.

So, how much time is necessary to keep my body alive and my life going? The answer is 4.98 hours on weekdays, and 5.14 hours on weekends.

< details class="code-fold"> < summary>Click to show the code
# Essentials and free time ----

## Prep ----
# Spare hours: everything exclude work hours and sleep time
# Devt hours: personal devt + professional devt
# Free hours: spare hours that are not in Essentials cat

get_essentials_wk <- function(df) {
  df %>% 
    mutate(spare_hr = rowSums(across(where(is.numeric)), na.rm = TRUE), # Sum all spare hours
           devt_hr = rowSums(across(c(PersDev, ProfDev)), na.rm = TRUE)) %>% # Calculate the total hours of skill development
    mutate(free_hr = rowSums(across(-c(Essentials, spare_hr, dtdate, devt_hr)), na.rm = TRUE)) %>% # Calculate the free hour = spare hours - essentials
    group_by(week = week(dtdate)) %>% 
    mutate(wk_spare = mean(spare_hr),
           wk_essentials = mean(Essentials),
           wk_free = mean(free_hr)) %>% 
    ungroup()
}

essentials_weekday <- get_essentials_wk(hrs_weekday)
essentials_weekend <- get_essentials_wk(hrs_weekend)

## Plot Essentials ----

plot_essentials <- function(df, p_title) {
  avg_essentials <- mean(df$Essentials) # Calculate the average essentials hours
  ggplot(df) +
    # Plot essentials by day
    geom_point(
               aes(x = dtdate, y = Essentials), 
               color = "#eed8b9", 
               size = 1) +
    # Plot essentials by week
    geom_line(
              aes(x = dtdate, y = wk_essentials),
              color = "#e3bf8b", linewidth = 1) +
    # Add mean trend line
    geom_hline(
      yintercept = avg_essentials, color = "#766e53", linetype = "dotted",
      linewidth = 0.7
    ) +
    labs(x = "", y = "Hours", title = p_title) +
    scale_x_date(date_breaks = "1 month", date_labels = "%b") +
    scale_y_continuous(limits = c(0, 15)) + # set the y axis limit to 15 hours
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5, vjust = 1.5, size = 15),
      plot.margin = unit(c(0.5, 3, 0.5, 0.5), "cm"), # Set margin to allow space for annotation
      panel.grid.major = element_blank(), # Remove grid lines
      axis.title.y = element_text(size = 10)
    ) +
    annotate("text",
      x = max(df$dtdate) + 22, 
      y = avg_essentials,
      label = paste("Avg Essentials\n", round(avg_essentials, 2), "h"),
      color = "black", 
      size = 3.5
    ) +
    coord_cartesian(xlim = c(min(all_data$dtdate), max(all_data$dtdate)), clip = "off") # Set the x axis limit
}

p_essentials_weekday <- plot_essentials(essentials_weekday, "Essentials hours spent, weekday")
p_essentials_weekend <- plot_essentials(essentials_weekend, "Essentials hours spent, weekend")
p_essentials <- grid.arrange(p_essentials_weekday, p_essentials_weekend, nrow = 2)

How much time I spent on life essential activities

The brown line represents the weekly average, while the dots are the daily sum. I think a weekly average would be more meaningful to look at, because it balances out those non-daily essential activities and isn’t as fluctuate as daily averages.

On the weekend graph, there was a peak in May, because I spent almost the entire day doing adulting chores on that weekend.

< section id="how-much-free-time-did-i-actually-have" class="level2 page-columns page-full">

How much free time did I actually have?

One of the most important insights I want to gain from time tracking is to figure out how much time I actually have to do my own stuff. After excluding sleep, work and life essentials, I have 5.27 hours on an average weekday and 9.32 hours on an average weekend to enjoy my life.

< details class="code-fold"> < summary>Click to show the code
## Spare/free/essentials ----

plot_spare_time <- function(df, p_title) {
  avg_free <- mean(df$free_hr) # Calculate the average hours spent
  ggplot(df, aes(x = dtdate)) +
    geom_ribbon(aes(ymin = wk_spare, ymax = 24, fill = "Sleep/Work")) +
    geom_ribbon(aes(ymin = 0, ymax = wk_spare, fill = "Free")) +
    geom_ribbon(aes(ymin = 0, ymax = wk_essentials, fill = "Essentials")) +
    scale_fill_manual(values = c(custom_colors, "Sleep/Work" = "#F6F6F5", "Free" = "#C2DEDC")) +
    labs(title = p_title,
         x = "",
         y = "Hours") +
    scale_x_date(date_labels = "%b", date_breaks = "1 month") +
    theme_minimal() +
    theme(
      plot.title = element_text(hjust = 0.5, vjust = -1, size = 15),
      axis.title.y = element_text(size = 10),
      panel.grid.major = element_blank(), # Remove grid lines
      panel.grid.minor = element_blank(),
      legend.position = "bottom", 
      legend.title = element_blank()) +
    geom_hline(yintercept = avg_free, color = "#44867d", linetype = "dotted",
               linewidth = 0.7) +
    annotate("text", x = midpoint, y = avg_free,
             label = paste("Avg Free =", round(avg_free, 2), "h"),
             color = "black", size = 4,
             vjust = 1.5
    )
}

p_spare_weekday <- plot_spare_time(essentials_weekday, "Available free time, weekday")
p_spare_weekend <- plot_spare_time(essentials_weekend, "Available free time, weekend")
p_spare <- grid.arrange(p_spare_weekday, p_spare_weekend, nrow = 1)

The time I was actually free vs Essentials

The light green area represents the actual Free time I have, while the light brown is the Essentials, and the white area is Sleep/Work hours. These add up to 24 hours on the y-axis, so each area reflects the true portion of the entire time. Once again, the data is based on weekly average to avoid the over-fluctuation of daily averages.

< section id="investing-time-in-myself" class="level2 page-columns page-full">

Investing time in myself

The categories I use to evaluate my productivity are Personal development and Profession development. I decided to combine these two categories into a single one “Skill development”, knowing that the majority of it was contributed by the former, such as personal hobbies.

On an average weekday, I spent 1.77 hours on skill development, while on weekends, I dedicate 3.72 hours to it, which is almost 2 hours more in comparison. The significant variance (ranging from 0 to 9+ hours per day) is interesting to look at, mostly because of my flow state style of doing tasks.

Overall, not bad, I would say.

< details class="code-fold"> < summary>Click to show the code
# Skill development ----
# Devt = PersDev + ProfDev hours

## Prep ----
# Combine weekday and weekend entry, and fill the blank with 0 to avoid grey area in graph
hrs_all <- rbind(hrs_weekday, hrs_weekend) %>% 
  complete(dtdate = seq(min(dtdate), max(dtdate), by = "day"), 
           fill = list(devt_hr = 0))
  
# Calculate weekly average
devt_weekday_avg <- mean(hrs_weekday$devt_hr) %>% round(., 2)
devt_weekend_avg <- mean(hrs_weekend$devt_hr) %>% round(., 2)

## Plot development hour chart in calendar heat map ----
p_devt <-
  ggplot_calendar_heatmap(
    hrs_all,
    cDateColumnName = "dtdate",
    cValueColumnName = "devt_hr",
    dayBorderSize = 0.35,
    dayBorderColour = "grey",
    monthBorderSize = 0.35,
    monthBorderColour = "dimgrey",
    monthBorderLineEnd = "round"
  ) +
  xlab(NULL) +
  ylab(NULL) +
  scale_fill_continuous(low = "white", high = "#45ccc7") +
  theme(
    plot.title = element_text(hjust = 0.5, vjust = 1.5, size = 15, face = "bold"),
    axis.title.y = element_text(size = 10),
    axis.ticks = element_blank(), 
    legend.position = "right",
    legend.title = element_blank(),
    strip.background = element_blank(),
    strip.text = element_blank(), # useful for only one year of data
    plot.background = element_rect(color = "white"),
    panel.border = element_blank(),
    panel.background = element_blank(),
    panel.grid = element_blank(),
    plot.caption = element_text(hjust = 1, vjust = -5, size = 11)
  ) +
  labs(
    title = "Skill development",
    caption = paste("Weekday Avg =", devt_weekday_avg, "h\n",
                    "Weekend Avg =", devt_weekend_avg, "h"))

p_devt

Personal and professional development heat map

I chose heat map because I want to see how my productivity fluctuated as the season changed. To do this, I combined the weekdays and weekends data frame, filled in blank with zero, and then used ggplot_calendar_heatmap() in the ggTimeSeries library to plot the calendar heat map.

May seems to be the low point, mainly due to the peak in the Essentials graph when I’m busy at chores. April is also low, and I wanted to see what happened in April to cause that downfall. The natural assumption is that I was busy doing something else, but what is it? Let’s find out.

< section id="social-media-v.s-others" class="level2 page-columns page-full">

Social media v.s others

What happened in April? It had a peak in the Sanity category. After checking my monthly review, I remembered that I spent lots of time playing video games with my husband, and it was also the time when I bought Hogwarts Legacy, which I put 80 hours into the game according to PS5 stats.

< details class="code-fold"> < summary>Click to show the code
# Dist/San/Devt ----

## Prep ----

dis_san_devt <- hrs_all %>% 
  group_by(month = month(dtdate)) %>% 
  summarize(Distraction = mean(Distraction, na.rm = TRUE),
            Sanity = mean(Sanity, na.rm = TRUE),
            Development = mean(devt_hr, na.rm = TRUE)) %>% 
  pivot_longer(!month, names_to = "category", values_to = "mo_avg")

## Plot stacked bar chart for three categories of activity ----
p_dis_san_dev <- 
  ggplot(dis_san_devt, aes(fill = category, y = mo_avg, x = month)) +
    scale_x_continuous(breaks = 1:12, labels = month.name) + # Set month labels
    geom_bar(position = "stack", stat = "identity", width = 0.7) +
    labs(x = "", y = "Hours", title = "Distraction, Sanity and Development") +
    theme_minimal() +
  # Set custom label for legend
  scale_fill_manual(values = c(custom_colors, Development = "#a2e5e3"), 
                    labels = c('Skill Development', 'Distraction', 'Sanity')) +
    theme(
      plot.title = element_text(hjust = 0.5, vjust = 1.5, size = 15, face = "bold"),
      plot.background = element_rect(color = "white"), 
      axis.title.y = element_text(size = 10),
      legend.title = element_blank(),
      legend.position = "bottom",
      panel.grid = element_blank()
    )

p_dis_san_dev

Social media and others

Looking at other month, there were ups and downs. I have a vague feeling that there might be a negative correlation between Distraction and Skill development, meaning that when I was spending more time on social media, I probably didn’t have the mood to do anything productive.

< section id="was-i-more-productive-when-i-spent-less-time-on-social-media" class="level2 page-columns page-full">

Was I more productive when I spent less time on social media?

The answer is No. To my surprise, Sanity and Skill development were substitutes to each other. That means when I didn’t want to spent time on developing my hobbies and skills, I tended to choose to entertain myself with activities like playing games instead of scrolling on social media.

< details class="code-fold"> < summary>Click to show the code
# Correlation Dist/Devt ----

## Prep ----

dis_devt <- hrs_all %>%
  mutate(
    # Convert NA to 0
    Sanity = if_else(is.na(Sanity), 0, Sanity),
    Distraction = if_else(is.na(Distraction), 0, Distraction)
  )

## Plot correlation ----

p_dis_devt <-
  ggplot(dis_devt, aes(x = devt_hr, y = Distraction)) +
  geom_point(aes(color = Sanity), size = 2) +
  scale_colour_gradient(low = "#D1D9EA", high = "darkorchid") +
  theme_classic() +
  labs(x = "Skill development") +
  geom_smooth(method = lm) +
  theme(legend.position = "bottom")

p_dis_san <-
  ggplot(dis_devt, aes(x = Sanity, y = Distraction)) +
  geom_point(aes(color = devt_hr), size = 2) +
  scale_colour_gradient(low = "lightblue", high = "darkblue", name = "Skill development") +
  theme_classic() +
  geom_smooth(method = lm) +

p_devt_san <-
  ggplot(dis_devt, aes(y = Sanity, x = devt_hr)) +
  geom_point(aes(color = Distraction), size = 2) +
  scale_colour_gradient(low = "gold", high = "red") +
  theme_classic() +
  labs(x = "Skill development") +
  geom_smooth(method = lm) +
  theme(legend.position = "bottom")

# Define common x and y axis limits
common_limits <- coord_cartesian(
  xlim = c(0, 10),
  ylim = c(0, 10)
)

# Apply the common limits to each plot
p_dis_devt <- p_dis_devt + common_limits
p_dis_san <- p_dis_san + common_limits
p_devt_san <- p_devt_san + common_limits

p_cor <- grid.arrange(p_dis_devt, p_dis_san, p_devt_san, ncol = 3, widths = c(1, 1, 1))

The relationship among Distraction, Sanity and Skill development

In the above chart, the three graphs show the relationship between Distraction, Sanity and Skill development. Each graph plots the relationship between two variables on the x and y using a scatter plot with a linear model, and the colour mapping of dots represents the third variable.

From the first two graphs, I didn’t see much correlation between Distraction and Skill development, nor Distraction and Sanity. The lines of linear model were almost flat. However, the third graph shows a correlation between Sanity and Skill development, making me draw the initial conclusion.

Another interesting fact is that my time spent on Distraction was relatively stable. As shown in all three graphs, Distraction ranged from 0 to 5 hours per day, and the colour mapping in the third graph looks quite consistent with very little variance.

Scatter plots are not hard to create, but when arrange all three in the same view, coord_cartesian() is necessary in order to keep the axes at the same scale.

< section id="bonus-more-activities-at-a-glimpse" class="level2 page-columns page-full">

Bonus: more activities at a glimpse

I mentioned that I’ve changed the category in the middle, because I realized that I need more details, especially for Personal development and Profession development, I want to know the exact time I spent on certain skills. For example, under Personal development, I created four sub-categories: Knitting, Blogging, Emacs and Website development.

After some aggregation from raw data, I have a data frame act that looks like this.

# A tibble: 12 × 3
  activity   main_cat    total_spent
  <chr>      <chr>             <dbl>
1 Blogging   PersDev           22.2 
2 CasReading Sanity            25.9 
3 Coding     ProfDev           15.7 
4 Emacs      PersDev           42.2 
...

And this is the breakdown of the total time I spent on each sub-category, based on 2 months of data.

< details class="code-fold"> < summary>Click to show the code
p_activity <-
  act %>%
  # Make sure activity is sorted by category
  arrange(main_cat) %>% # Sort by main_cat
    # This trick update the factor levels
    # Source: https://r-graph-gallery.com/267-reorder-a-variable-in-ggplot2.html
  mutate(activity = factor(activity, levels = activity)) %>% 
  ggplot(aes(x = activity, y = total_spent)) +
  # Plot bar line
  geom_segment(aes(xend = activity, yend = 0), color = "grey") +
  # Add lolipop that filled with main_cat color
  geom_point(size = 4, aes(color = main_cat)) +
  # Add labels
  geom_text(aes(label = paste0(round(total_spent), "h")), hjust = -0.5) +
  scale_color_manual(values = custom_colors,
                     name = "Category") +
  coord_flip(ylim = c(0, max(act$total_spent) + 20), clip = "off") +
  theme_minimal() +
  labs(x = "Activity", y = "Hours", title = "How I spent my time, by sub-category") +
  theme(
    plot.title = element_text(hjust = 0.5, vjust = 1.5, size = 16, face = "bold"),
    panel.grid = element_blank(),
    axis.title.y = element_text(size = 10),
    plot.background = element_rect(color = "white")
  )

p_activity

How I spent my free time, by sub-category

Each lollipop shows the total amount of time I spent on a sub-category in my free time, and I coloured the lollipop to reflect the parent category as shown in the legend. I didn’t go with daily or weekly average like previous graphs, because I only have 2 months of data and it’s not representative enough to conduct a daily average analysis. For example, last month I spent an insane amount of time (42 hours) on learning Org-mode (Emacs), but this is not something I would do every day.

System maintenance contributes a big portion to my free time. To clarify, it includes self-reflection reviews, tinkering with productivity systems, trying new apps, etc. I did spent a lot of time playing with some self-hosted apps recently, so it is entirely expected. Otherwise, I would reconsider the category and split it further.

When plotting the lollipop graph, I reordered each sub-category to ensure they are arranged by their parent category. There are many ways to do it, and I chose the dplyr way.

< section id="apps-i-used-for-time-tracking" class="level2">

Apps I used for time tracking

The principle is simple. It needs to be easy and quick enough to track time. Anything that takes more than 10 seconds to track would not work for long period of time tracking.


Time I spent on this article

To leave a comment for the author, please follow the link and comment on their blog: Aster Hu's Blog | Asteroid.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version