Site icon R-bloggers

Visualizing macOS App Usage with a Little Help from osqueryr & mactheknife

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Both my osqueryr and macthekinfe packages have had a few updates and I wanted to put together a fun example (it being Friday, and all) for what you can do with them. All my packages are now on GitHub and GitLab and I’ll be maintaining them on both so I can accommodate the comfort-level of any and all contributors but will be prioritizing issues and PRs on GitLab ahead of any other platform. Having said that, I’ll mark non-CRAN packages with a # notcran comment in the source views so you know you need to install it from wherever you like to grab sketch packages from.

One table that osquery makes available under macOS is an inventory of all “apps” that macOS knows about. Previous posts have shown how to access these tables via the dplyr interface I built for osquery, but they involved multiple steps and as I started to use it more regularly (especially to explore the macOS 10.14 beta I’m running) I noticed that it could use some helper functions. One in particular — osq_expose_tables() — is pretty helpful in that it handles all the dplyr boilerplate code and makes table(s) available in the global environment by name. It takes a single table name or regular expression and then exposes all matching entities. While the function has a help page, it’s easier just to see it in action. Let’s expose the apps table:

library(osqueryr) # notcran
library(tidyverse)

osq_expose_tables("apps")

apps
## # Source:   table [?? x 19]
## # Database: OsqueryConnection
##    applescript_enab… bundle_executable    bundle_identifier   bundle_name  bundle_package_…
##    < chr>             < chr>                < chr>               < chr>        < chr>           
##  1 0                 1Password 6          com.agilebits.onep… 1Password 6  APPL            
##  2 0                 2BUA8C4S2C.com.agil… 2BUA8C4S2C.com.agi… 1Password m… APPL            
##  3 1                 Adium                com.adiumX.adiumX   Adium        APPL            
##  4 1                 Adobe Connect        com.adobe.adobecon… Adobe Conne… APPL            
##  5 1                 Adobe Illustrator    com.adobe.illustra… Illustrator… APPL            
##  6 ""                AIGPUSniffer         com.adobe.AIGPUSni… AIGPUSniffer APPL            
##  7 ""                CEPHtmlEngine Helper com.adobe.cep.CEPH… CEPHtmlEngi… APPL            
##  8 ""                CEPHtmlEngine        com.adobe.cep.CEPH… CEPHtmlEngi… APPL            
##  9 ""                LogTransport2        com.adobe.headligh… LogTranspor… APPL            
## 10 ""                droplet              ""                  Analyze Doc… APPL            
## # ... with more rows, and 14 more variables: bundle_short_version < chr>,
## #   bundle_version < chr>, category < chr>, compiler < chr>, copyright < chr>,
## #   development_region < chr>, display_name < chr>, element < chr>, environment < chr>,
## #   info_string < chr>, last_opened_time < chr>, minimum_system_version < chr>, name < chr>,
## #   path < chr>

There’s tons of info on all the apps macOS knows about, some of which are system services and “helper” apps (like Chrome’s auto-updater). One field — last_opened_time — caught my eye and I thought it would be handy to see which apps had little use (i.e. ones that haven’t been opened in a while) and which apps I might use more frequently (i.e. ones with more recent “open” times). That last_open_time is a fractional POSIX timestamp and, due to the way osquery created the schemas, it’s in a character field. That’s easy enough to convert and then arrange() the whole list in descending order to let you see what you use most frequently.

But, this is R and we can do better than a simple table or even a DT::datatable().

I recently added the ability to read macOS property lists (a.k.a. “plists”) to mactheknife by wrapping a Python module (plistlib). Since all (OK, “most”) macOS apps have an icon, I thought it would be fun to visualize the last opened frequency for each app using the app icons and ggplot2. Unfortunately, the ImageMagick (and, thus the magick package) cannot read macOS icns files, so you’ll need to do a brew install libicns before working with any of the remaining code since we’ll be relying on a command-line utility from that formula.

Let’s get the frontmatter out of the way:

library(sys)
library(magick)
library(osqueryr) # notcran
library(mactheknife) #notcran
library(ggimage)
library(hrbrthemes)
library(ggbeeswarm)
library(tidyverse)

osq_expose_tables("apps")

# macOS will use a generic app icon when none is present in an app bundle; this is the location and we'll
# need to use it when our plist app spelunking comes up short

default_app <- "/System/Library/CoreServices/CoreTypes.bundle/Contents/Resources/GenericApplicationIcon.icns"

Next, we'll:

select(apps, name, path, last_opened_time) %>%
  collect() %>%
  filter(!str_detect(path, "(^/System|usr|//System|/Library/|Helper|/Contents/|\\.service$)")) %>%
  mutate(lop_day = as.Date(anytime::anytime(as.numeric(last_opened_time)))) %>%
  mutate(icon = map_chr(path, ~{
    p <- read_plist(file.path(.x, "Contents", "Info.plist"))
    icns <- p$CFBundleIconFile[1]
    if (is.null(icns)) return(default_app)
    if (!str_detect(icns, "\\.icns$")) icns <- sprintf("%s.icns", icns)
    file.path(.x, "Contents", "Resources", icns)
  })) -> apps_df

apps_df
## # A tibble: 274 x 5
##    last_opened_time name                       path                      lop_day    icon                       
##    < chr>            < chr>                      < chr>                          < chr>                      
##  1 1529958322.11297 1Password 6.app            /Applications/1Password … 2018-06-25 /Applications/1Password 6.…
##  2 1523889402.80918 Adium.app                  /Applications/Adium.app   2018-04-16 /Applications/Adium.app/Co…
##  3 1516307513.7606  Adobe Connect.app          /Applications/Adobe Conn… 2018-01-18 /Applications/Adobe Connec…
##  4 1530044681.76677 Adobe Illustrator.app      /Applications/Adobe Illu… 2018-06-26 /Applications/Adobe Illust…
##  5 -1.0             Analyze Documents.app      /Applications/Adobe Illu… 1969-12-31 /Applications/Adobe Illust…
##  6 -1.0             Make Calendar.app          /Applications/Adobe Illu… 1969-12-31 /Applications/Adobe Illust…
##  7 -1.0             Contact Sheets.app         /Applications/Adobe Illu… 1969-12-31 /Applications/Adobe Illust…
##  8 -1.0             Export Flash Animation.app /Applications/Adobe Illu… 1969-12-31 /Applications/Adobe Illust…
##  9 -1.0             Web Gallery.app            /Applications/Adobe Illu… 1969-12-31 /Applications/Adobe Illust…
## 10 -1.0             Adobe InDesign CC 2018.app /Applications/Adobe InDe… 1969-12-31 /Applications/Adobe InDesi…
## # ... with 264 more rows

Since I really didn't feel like creating a package wrapper for libicns, we're going to use the sys package to make system calls to convert the icns files to png files. We really don't want to do this repeatedly for the same files if we ever run this again, so we'll setup a cache directory to hold our converted pngs.

Apps can (and, usually do) have multiple icons with varying sizes and are not guaranteed to have every common size available. So, we'll have the libicns icns2png utility extract all the icons and use the highest resolution one, using magick to reduce it to a 32x32 png bitmap.

# setup the cache dir -- use whatever you want
cache_dir <- path.expand("~/.r-icns-cache")
dir.create(cache_dir)

# create a unique name hash for more compact names
mutate(apps_df, icns_png = map_chr(icon, ~{
  hash <- digest::digest(.x, serialize=FALSE)
  file.path(cache_dir, sprintf("%s.png", hash))
})) -> apps_df

# find the icns2png program
icns2png <- unname(Sys.which("icns2png"))

# go through each icon file 
pb <- progress_estimated(length(apps_df$icns_png))
walk2(apps_df$icon, apps_df$icns_png, ~{

  pb$tick()$print() # progress!

  if (!file.exists(.y)) { # don't create it if it already exists

    td <- tempdir()

    # no icon file == use default one
    if (!file.exists(.x)) .x <- default_app

    # convert all of them to pngs
    sys::exec_internal(
      cmd = icns2png,
      args = c("-x", "-o", td, .x),
      error = FALSE
    ) -> res

    rawToChar(res$stdout) %>% # go through icns2png output
      str_split("\n") %>%
      flatten_chr() %>%
      keep(str_detect, "  Saved") %>% # find all the extracted icons
      last() %>% # use the last one
      str_replace(".* to /", "/") %>% # clean up the filename so we can read it in
      str_replace("\\.$", "") -> png

    # read and convert
    image_read(png) %>%
      image_resize(geometry_area(32, 32)) %>%
      image_write(.y)

  }

})

You can open up that cache directory with the macOS finder to find all the extracted/converted pngs.

Now, we're on the final leg of our app-use visualization journey.

Some system/utility apps have start-of-epoch dates due to the way the macOS installer tags them. We only want "recent" ones so I set an arbitrary cutoff date of the year 2000. Since many apps would have the same last opened date, I wanted to get a spread out layout "for free". One way to do that is to use ggbeeswarm::position_beswarm():

filter(apps_df, lop_day > as.Date("2000-01-01")) %>%
  ggplot() +
  geom_image(
    aes(x="", lop_day, image = icns_png), size = 0.033,
    position = position_quasirandom(width = 0.5)
  ) +
  geom_text(
    data = data_frame(
      x = c(0.6, 0.6),
      y = as.Date(c("2018-05-01", "2017-09-15")),
      label = c("More recently used ↑", "Not used in a while ↓")
    ), 
    aes(x, y, label=label), family = _an, size = 5 , hjust = 0,
    color = "lightslategray"
  ) +
  labs(x = NULL, y = "Last Opened Time") +
  labs(
    x = NULL, y = NULL,
    title = "macOS 'Last Used' App History"
  ) +
  theme_ipsum_rc(grid="Y") +
  theme(axis.text.x = element_blank())

There are tons of other ways to look at this data and you can use the osquery daemon to log this data regularly so you can get an extra level of detail. An interesting offshot project would be to grab the latest RStudio dailies and see if you can wrangle a sweet D3 visualization from the app data we collected. Make sure to drop a comment with your creations in the comments. You can find the full code in this snippet.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.