Spelunking macOS ‘ScreenTime’ App Usage with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Apple has brought Screen Time to macOS for some time now and that means it has to store this data somewhere. Thankfully, Sarah Edwards has foraged through the macOS filesystem for us and explained where these bits of knowledge are in her post, Knowledge is Power! Using the macOS/iOS knowledgeC.db Database to Determine Precise User and Application Usage, which ultimately reveals the data lurks in ~/Library/Application Support/Knowledge/knowledgeC.db
. Sarah also has a neat little Python utility dubbed APOLLO (Apple Pattern of Life Lazy Output’er) which has a smattering of knowledgeC.db
canned SQL queries that cover a myriad of tracked items.
Today, we’ll show how to work with this database in R and the {tidyverse} to paint our own pictures of application usage.
There are quite a number of tables in the knowledgeC.db
SQLite 3 database:
That visual schema was created in OmniGraffle via a small R script that uses the OmniGraffle automation framework. The OmniGraffle source files are also available upon request.
Most of the interesting bits (for any tracking-related spelunking) are in the ZOBJECT
table and to get a full picture of usage we’ll need to join it with some other tables that are connected via a few foreign keys:
There are a few ways to do this in {tidyverse} R. The first is an extended straight SQL riff off of one of Sarah’s original queries:
library(hrbrthemes) # for ggplot2 machinations library(tidyverse) # source the knowledge db kdb <- src_sqlite("~/Library/Application Support/Knowledge/knowledgeC.db") tbl( kdb, sql(' SELECT ZOBJECT.ZVALUESTRING AS "app", (ZOBJECT.ZENDDATE - ZOBJECT.ZSTARTDATE) AS "usage", CASE ZOBJECT.ZSTARTDAYOFWEEK WHEN "1" THEN "Sunday" WHEN "2" THEN "Monday" WHEN "3" THEN "Tuesday" WHEN "4" THEN "Wednesday" WHEN "5" THEN "Thursday" WHEN "6" THEN "Friday" WHEN "7" THEN "Saturday" END "dow", ZOBJECT.ZSECONDSFROMGMT/3600 AS "tz", DATETIME(ZOBJECT.ZSTARTDATE + 978307200, \'UNIXEPOCH\') as "start_time", DATETIME(ZOBJECT.ZENDDATE + 978307200, \'UNIXEPOCH\') as "end_time", DATETIME(ZOBJECT.ZCREATIONDATE + 978307200, \'UNIXEPOCH\') as "created_at", CASE ZMODEL WHEN ZMODEL THEN ZMODEL ELSE "Other" END "source" FROM ZOBJECT LEFT JOIN ZSTRUCTUREDMETADATA ON ZOBJECT.ZSTRUCTUREDMETADATA = ZSTRUCTUREDMETADATA.Z_PK LEFT JOIN ZSOURCE ON ZOBJECT.ZSOURCE = ZSOURCE.Z_PK LEFT JOIN ZSYNCPEER ON ZSOURCE.ZDEVICEID = ZSYNCPEER.ZDEVICEID WHERE ZSTREAMNAME = "/app/usage"' )) -> usage usage ## # Source: SQL [?? x 8] ## # Database: sqlite 3.29.0 [/Users/johndoe/Library/Application Support/Knowledge/knowledgeC.db] ## app usage dow tz start_time end_time created_at source ## <chr> <int> <chr> <int> <chr> <chr> <chr> <chr> ## 1 com.bitrock.appinstaller 15 Friday -4 2019-10-05 01:11:27 2019-10-05 01:11:42 2019-10-05 01:11:… MacBookPro13… ## 2 com.tinyspeck.slackmacg… 4379 Tuesday -4 2019-10-01 13:19:24 2019-10-01 14:32:23 2019-10-01 14:32:… Other ## 3 com.tinyspeck.slackmacg… 1167 Tuesday -4 2019-10-01 18:19:24 2019-10-01 18:38:51 2019-10-01 18:38:… Other ## 4 com.tinyspeck.slackmacg… 1316 Tuesday -4 2019-10-01 19:13:49 2019-10-01 19:35:45 2019-10-01 19:35:… Other ## 5 com.tinyspeck.slackmacg… 12053 Thursday -4 2019-10-03 12:25:18 2019-10-03 15:46:11 2019-10-03 15:46:… Other ## 6 com.tinyspeck.slackmacg… 1258 Thursday -4 2019-10-03 15:50:16 2019-10-03 16:11:14 2019-10-03 16:11:… Other ## 7 com.tinyspeck.slackmacg… 2545 Thursday -4 2019-10-03 16:24:30 2019-10-03 17:06:55 2019-10-03 17:06:… Other ## 8 com.tinyspeck.slackmacg… 303 Thursday -4 2019-10-03 17:17:10 2019-10-03 17:22:13 2019-10-03 17:22:… Other ## 9 com.tinyspeck.slackmacg… 9969 Thursday -4 2019-10-03 17:33:38 2019-10-03 20:19:47 2019-10-03 20:19:… Other ## 10 com.tinyspeck.slackmacg… 2813 Thursday -4 2019-10-03 20:19:52 2019-10-03 21:06:45 2019-10-03 21:06:… Other ## # … with more rows
Before explaining what that query does, let’s rewrite it {dbplyr}-style:
tbl(kdb, "ZOBJECT") %>% mutate( created_at = datetime(ZCREATIONDATE + 978307200, "UNIXEPOCH", "LOCALTIME"), start_dow = case_when( ZSTARTDAYOFWEEK == 1 ~ "Sunday", ZSTARTDAYOFWEEK == 2 ~ "Monday", ZSTARTDAYOFWEEK == 3 ~ "Tuesday", ZSTARTDAYOFWEEK == 4 ~ "Wednesday", ZSTARTDAYOFWEEK == 5 ~ "Thursday", ZSTARTDAYOFWEEK == 6 ~ "Friday", ZSTARTDAYOFWEEK == 7 ~ "Saturday" ), start_time = datetime(ZSTARTDATE + 978307200, "UNIXEPOCH", "LOCALTIME"), end_time = datetime(ZENDDATE + 978307200, "UNIXEPOCH", "LOCALTIME"), usage = (ZENDDATE - ZSTARTDATE), tz = ZSECONDSFROMGMT/3600 ) %>% left_join(tbl(kdb, "ZSTRUCTUREDMETADATA"), c("ZSTRUCTUREDMETADATA" = "Z_PK")) %>% left_join(tbl(kdb, "ZSOURCE"), c("ZSOURCE" = "Z_PK")) %>% left_join(tbl(kdb, "ZSYNCPEER"), "ZDEVICEID") %>% filter(ZSTREAMNAME == "/app/usage") %>% select( app = ZVALUESTRING, created_at, start_dow, start_time, end_time, usage, tz, source = ZMODEL ) %>% mutate(source = ifelse(is.na(source), "Other", source)) %>% collect() %>% mutate_at(vars(created_at, start_time, end_time), as.POSIXct) -> usage
What we’re doing is pulling out the day of week, start/end usage times & timezone info, app bundle id, source of the app interactions and the total usage time for each entry along with when that entry was created. We need to do some maths since Apple stores time-y whime-y info in its own custom format, plus we need to convert numeric DOW to labeled DOW.
The bundle ids are pretty readable, but they’re not really intended for human consumption, so we’ll make a translation table for the bundle id to app name by using the mdls
command.
list.files( c("/Applications", "/System/Library/CoreServices", "/Applications/Utilities", "/System/Applications"), # main places apps are stored (there are potentially more but this is sufficient for our needs) pattern = "\\.app$", full.names = TRUE ) -> apps x <- sys::exec_internal("mdls", c("-name", "kMDItemCFBundleIdentifier", "-r", apps)) # mdls null (\0) terminates each entry so we have to do some raw surgery to get it into a format we can use x$stdout[x$stdout == as.raw(0)] <- as.raw(0x0a) tibble( name = gsub("\\.app$", "", basename(apps)), app = read_lines(x$stdout) ) -> app_trans app_trans ## # A tibble: 270 x 2 ## name app ## <chr> <chr> ## 1 1Password 7 com.agilebits.onepassword7 ## 2 Adium com.adiumX.adiumX ## 3 Agenda com.momenta.agenda.macos ## 4 Alfred 4 com.runningwithcrayons.Alfred ## 5 Amazon Music com.amazon.music ## 6 Android File Transfer com.google.android.mtpviewer ## 7 Awsaml com.rapid7.awsaml ## 8 Bartender 2 com.surteesstudios.Bartender ## 9 BBEdit com.barebones.bbedit ## 10 BitdefenderVirusScanner com.bitdefender.BitdefenderVirusScanner ## # … with 260 more rows
The usage info goes back ~30 days, so let’s do a quick summary of the top 10 apps and their total usage (in hours):
usage %>% group_by(app) %>% summarise(first = min(start_time), last = max(end_time), total = sum(usage, na.rm=TRUE)) %>% ungroup() %>% mutate(total = total / 60 / 60) %>% # hours arrange(desc(total)) %>% left_join(app_trans) -> overall_usage overall_usage %>% slice(1:10) %>% left_join(app_trans) %>% mutate(name = fct_inorder(name) %>% fct_rev()) %>% ggplot(aes(x=total, y=name)) + geom_segment(aes(xend=0, yend=name), size=5, color = ft_cols$slate) + scale_x_comma(position = "top") + labs( x = "Total Usage (hrs)", y = NULL, title = glue::glue('App usage in the past {round(as.numeric(max(usage$end_time) - min(usage$start_time), "days"))} days') ) + theme_ft_rc(grid="X")
There’s a YUGE flaw in the current way macOS tracks application usage. Unlike iOS where apps really don’t run simultaneously (with iPadOS they kinda can/do, now), macOS apps are usually started and left open along with other apps. Apple doesn’t do a great job identifying only active app usage activity so many of these usage numbers are heavily inflated. Hopefully that will be fixed by macOS 10.15.
We have more data at our disposal, so let’s see when these apps get used. To do that, we’ll use segments to plot individual usage tracks and color them by weekday/weekend usage (still limiting to top 10 for blog brevity):
usage %>% filter(app %in% overall_usage$app[1:10]) %>% left_join(app_trans) %>% mutate(name = factor(name, levels = rev(overall_usage$name[1:10]))) %>% ggplot() + geom_segment( aes( x = start_time, xend = end_time, y = name, yend = name, color = ifelse(start_dow %in% c("Saturday", "Sunday"), "Weekend", "Weekday") ), size = 10, ) + scale_x_datetime(position = "top") + scale_colour_manual( name = NULL, values = c( "Weekend" = ft_cols$light_blue, "Weekday" = ft_cols$green ) ) + guides( colour = guide_legend(override.aes = list(size = 1)) ) + labs( x = NULL, y = NULL, title = glue::glue('Top 10 App usage on this Mac in the past {round(as.numeric(max(usage$end_time) - min(usage$start_time), "days"))} days'), subtitle = "Each segment represents that app being 'up' (Open to Quit).\nUnfortunately, this is what Screen Time uses for its calculations on macOS" ) + theme_ft_rc(grid="X") + theme(legend.position = c(1, 1.25)) + theme(legend.justification = "right")
I’m not entirely sure “on this Mac” is completely accurate since I think this syncs across all active Screen Time devices due to this (n is in seconds):
count(usage, source, wt=usage, sort=TRUE) ## # A tibble: 2 x 2 ## source n ## <chr> <int> ## 1 Other 4851610 ## 2 MacBookPro13,2 1634137
The “Other” appears to be the work-dev Mac but it doesn’t have the identifier mapped so I think that means it’s the local one and that the above chart is looking at Screen Time across all devices. I literally (right before this sentence) enabled Screen Time on my iPhone so we’ll see if that ends up in the database and I’ll post a quick update if it does.
We’ll take one last look by day of week and use a heatmap to see the results:
count(usage, start_dow, app, wt=usage/60/60) %>% left_join(app_trans) %>% filter(app %in% overall_usage$app[1:10]) %>% mutate(name = factor(name, levels = rev(overall_usage$name[1:10]))) %>% mutate(start_dow = factor(start_dow, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))) %>% ggplot() + geom_tile(aes(start_dow, name, fill = n), color = "#252a32", size = 0.75) + scale_x_discrete(expand = c(0, 0.5), position = "top") + scale_y_discrete(expand = c(0, 0.5)) + scale_fill_viridis_c(direction = -1, option = "magma", name = "Usage (hrs)") + labs( x = NULL, y = NULL, title = "Top 10 App usage by day of week" ) + theme_ft_rc(grid="")
I really need to get into the habit of using the RStudio Server access features of RSwitch over Chrome so I can get RSwitch into the top 10, but some habits (and bookmarks) die hard.
FIN
Apple’s Screen Time also tracks “category”, which is something we can pick up from each application’s embedded metadata. We’ll do that in a follow-up post along with seeing whether we can capture iOS usage now that I’ve enabled Screen Time on those devices as well.
Keep spelunking the knowledgeC.db
table(s) and blog about or reply in the comments with any interesting nuggets you find.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.