Site icon R-bloggers

Quick Hit: Processing macOS Application Metadata Weirdly Fast with mdls and R

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

(reminder: Quick Hits have minimal explanatory blathering, but I can elaborate on anything if folks submit a comment).

I’m playing around with Screen Time on xOS again and noticed mdls (macOS command line utility for getting file metadata) has a -plist option (it probably has for a while & I just never noticed it). I further noticed there’s a kMDItemExecutableArchitectures key (which, too, may have been “a thing” before as well). Having application metadata handy for the utility functions I’m putting together for Rmd-based Screen Time reports would be handy, so I threw together some quick code to show how to work with it in R.

Running mdls -plist /some/file.plist ...path-to-apps... will generate a giant property list file with all metadata for all the apps specified. It’s a wicked fast command even when grabbing and outputting metadata for all apps on a system.

Each entry looks like this:

<dict>
    <key>_kMDItemDisplayNameWithExtensions</key>
    <string>RStudio — tycho.app</string>
    <key>kMDItemAlternateNames</key>
    <array>
      <string>RStudio — tycho.app</string>
    </array>
    <key>kMDItemCFBundleIdentifier</key>
    <string>com.RStudio_—_tycho</string>
    <key>kMDItemContentCreationDate</key>
    <date>2021-01-31T17:56:46Z</date>
    <key>kMDItemContentCreationDate_Ranking</key>
    <date>2021-01-31T00:00:00Z</date>
    <key>kMDItemContentModificationDate</key>
    <date>2021-01-31T17:56:46Z</date>
    <key>kMDItemContentModificationDate_Ranking</key>
    <date>2021-01-31T00:00:00Z</date>
    <key>kMDItemContentType</key>
    <string>com.apple.application-bundle</string>
    <key>kMDItemContentTypeTree</key>
    <array>
      <string>com.apple.application-bundle</string>
      <string>com.apple.application</string>
      <string>public.executable</string>
      <string>com.apple.localizable-name-bundle</string>
      <string>com.apple.bundle</string>
      <string>public.directory</string>
      <string>public.item</string>
      <string>com.apple.package</string>
    </array>
    <key>kMDItemCopyright</key>
    <string>Copyright © 2017-2020 BZG Inc. All rights reserved.</string>
    <key>kMDItemDateAdded</key>
    <date>2021-04-09T18:29:52Z</date>
    <key>kMDItemDateAdded_Ranking</key>
    <date>2021-04-09T00:00:00Z</date>
    <key>kMDItemDisplayName</key>
    <string>RStudio — tycho.app</string>
    <key>kMDItemDocumentIdentifier</key>
    <integer>0</integer>
    <key>kMDItemExecutableArchitectures</key>
    <array>
      <string>x86_64</string>
    </array>
    <key>kMDItemFSContentChangeDate</key>
    <date>2021-01-31T17:56:46Z</date>
    <key>kMDItemFSCreationDate</key>
    <date>2021-01-31T17:56:46Z</date>
    <key>kMDItemFSCreatorCode</key>
    <integer>0</integer>
    <key>kMDItemFSFinderFlags</key>
    <integer>0</integer>
    <key>kMDItemFSInvisible</key>
    <false/>
    <key>kMDItemFSIsExtensionHidden</key>
    <true/>
    <key>kMDItemFSLabel</key>
    <integer>0</integer>
    <key>kMDItemFSName</key>
    <string>RStudio — tycho.app</string>
    <key>kMDItemFSNodeCount</key>
    <integer>1</integer>
    <key>kMDItemFSOwnerGroupID</key>
    <integer>20</integer>
    <key>kMDItemFSOwnerUserID</key>
    <integer>501</integer>
    <key>kMDItemFSSize</key>
    <integer>37451395</integer>
    <key>kMDItemFSTypeCode</key>
    <integer>0</integer>
    <key>kMDItemInterestingDate_Ranking</key>
    <date>2021-04-13T00:00:00Z</date>
    <key>kMDItemKind</key>
    <string>Application</string>
    <key>kMDItemLastUsedDate</key>
    <date>2021-04-13T12:47:12Z</date>
    <key>kMDItemLastUsedDate_Ranking</key>
    <date>2021-04-13T00:00:00Z</date>
    <key>kMDItemLogicalSize</key>
    <integer>37451395</integer>
    <key>kMDItemPhysicalSize</key>
    <integer>38092800</integer>
    <key>kMDItemUseCount</key>
    <integer>20</integer>
    <key>kMDItemUsedDates</key>
    <array>
      <date>2021-03-15T04:00:00Z</date>
      <date>2021-03-17T04:00:00Z</date>
      <date>2021-03-18T04:00:00Z</date>
      <date>2021-03-19T04:00:00Z</date>
      <date>2021-03-22T04:00:00Z</date>
      <date>2021-03-25T04:00:00Z</date>
      <date>2021-03-30T04:00:00Z</date>
      <date>2021-04-01T04:00:00Z</date>
      <date>2021-04-03T04:00:00Z</date>
      <date>2021-04-05T04:00:00Z</date>
      <date>2021-04-07T04:00:00Z</date>
      <date>2021-04-08T04:00:00Z</date>
      <date>2021-04-12T04:00:00Z</date>
      <date>2021-04-13T04:00:00Z</date>
    </array>
    <key>kMDItemVersion</key>
    <string>4.0.1</string>
  </dict>

We can get all the metadata for all installed apps in R via:

library(sys)
library(xml2)
library(tidyverse)

# get full paths to all the apps
list.files(
  c("/Applications", "/System/Library/CoreServices", "/Applications/Utilities", "/System/Applications"), 
  pattern = "\\.app$", 
  full.names = TRUE
) -> apps

# generate a giant property list with all the app attributres
tf <- tempfile(fileext = ".plist")
sys::exec_internal("mdls", c("-plist", tf, apps))

Unfortunately, some companies — COUGH Logitech COUGH — stick illegal entities in some values, so we have to take care of those (I used xmllint to see which one(s) were bad):

# read it in and clean up CDATA error (Logitech has a bad value in one field)
fil <- readr::read_file_raw(tf)
fil[fil == as.raw(0x03)] <- charToRaw(" ")

Now, we can read in the XML without errors:

# now parse it and get the top of each app entry
applist <- xml2::read_xml(fil)
(applist <- xml_find_all(applist, "//array/dict"))
## {xml_nodeset (196)}
##  [1] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>1Blocker (Old).app</string>\n  <key>kMDItemAlternateNames</key>\n ...
##  [2] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>1Password 7.app</string>\n  <key>_kMDItemEngagementData</key>\n   ...
##  [3] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Adblock Plus.app</string>\n  <key>kMDItemAlternateNames</key>\n   ...
##  [4] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>AdBlock.app</string>\n  <key>kMDItemAlternateNames</key>\n  <arra ...
##  [5] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>AdGuard for Safari.app</string>\n  <key>kMDItemAlternateNames</ke ...
##  [6] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Agenda.app</string>\n  <key>kMDItemAlternateNames</key>\n  <array ...
##  [7] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Alfred 4.app</string>\n  <key>kMDItemAlternateNames</key>\n  <arr ...
##  [8] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Android File Transfer.app</string>\n  <key>kMDItemAlternateNames< ...
##  [9] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Asset Catalog Creator Pro.app</string>\n  <key>kMDItemAlternateNa ...
## [10] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Awsaml.app</string>\n  <key>kMDItemAlternateNames</key>\n  <array ...
## [11] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Boop.app</string>\n  <key>kMDItemAlternateNames</key>\n  <array>\ ...
## [12] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Buffer.app</string>\n  <key>kMDItemAlternateNames</key>\n  <array ...
## [13] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Burp Suite Community Edition.app</string>\n  <key>kMDItemAlternat ...
## [14] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Camera Settings.app</string>\n  <key>kMDItemAlternateNames</key>\ ...
## [15] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Cisco Webex Meetings.app</string>\n  <key>kMDItemAlternateNames</ ...
## [16] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Claquette.app</string>\n  <key>kMDItemAlternateNames</key>\n  <ar ...
## [17] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Discord.app</string>\n  <key>kMDItemAlternateNames</key>\n  <arra ...
## [18] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Elgato Control Center.app</string>\n  <key>kMDItemAlternateNames< ...
## [19] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>F5 Weather.app</string>\n  <key>kMDItemAlternateNames</key>\n  <a ...
## [20] <dict>\n  <key>_kMDItemDisplayNameWithExtensions</key>\n  <string>Fantastical.app</string>\n  <key>kMDItemAlternateNames</key>\n  < ...
## ...

I really dislike property lists as I’m not a fan of position-dependent records in XML files. To get values for keys, we have to find the key, then go to the next sibling, figure out its type, and handle it accordingly. This is a verbose enough process to warrant creating a small helper function:

# helper function to retrieve the values for a given key
kval <- function(doc, key) {

  val <- xml_find_first(doc, sprintf(".//key[contains(., '%s')]/following-sibling::*", key))

  switch(
    unique(na.omit(xml_name(val))),
    "array" = as_list(val) |> map(unlist, use.names = FALSE) |> map(unique),
    "integer" = xml_integer(val),
    "true" = TRUE,
    "false" = FALSE,
    "string" = xml_text(val, trim = TRUE)
  )

}

This is nowhere near as robust as XML::readKeyValueDB() but it doesn’t have to be for this particular use case.

We can build up a data frame with certain fields (I wanted to know how many apps still aren’t Universal):

tibble(
  category = kval(applist, "kMDItemAppStoreCategory"),
  bundle_id = kval(applist, "kMDItemCFBundleIdentifier"),
  display_name = kval(applist, "kMDItemDisplayName"),
  arch = kval(applist, "kMDItemExecutableArchitectures"),
) |> 
  print() -> app_info
## # A tibble: 196 x 4
##    category        bundle_id                            display_name                  arch     
##    <chr>           <chr>                                <chr>                         <list>   
##  1 Productivity    com.khanov.BlockerMac                1Blocker (Old).app            <chr [2]>
##  2 Productivity    com.agilebits.onepassword7           1Password 7.app               <chr [2]>
##  3 Productivity    org.adblockplus.adblockplussafarimac Adblock Plus.app              <chr [2]>
##  4 Productivity    com.betafish.adblock-mac             AdBlock.app                   <chr [1]>
##  5 Utilities       com.adguard.safari.AdGuard           AdGuard for Safari.app        <chr [1]>
##  6 Productivity    com.momenta.agenda.macos             Agenda.app                    <chr [2]>
##  7 Productivity    com.runningwithcrayons.Alfred        Alfred 4.app                  <chr [2]>
##  8 NA              com.google.android.mtpviewer         Android File Transfer.app     <chr [1]>
##  9 Developer Tools com.bridgetech.asset-catalog         Asset Catalog Creator Pro.app <chr [2]>
## 10 Developer Tools com.rapid7.awsaml                    Awsaml.app                    <chr [1]>
## # … with 186 more rows

Finally, we can expand the arch column and see how many apps support Apple Silicon:

app_info |> 
  unnest(arch) |> 
  spread(arch, arch) |> 
  mutate_at(
    vars(arm64, x86_64),
    ~!is.na(.x)
  ) |> 
  count(arm64)
## # A tibble: 2 x 2
##   arm64     n
##   <lgl> <int>
## 1 FALSE    33
## 2 TRUE    163

Alas, there are still some stragglers stuck in Rosetta 2.

FIN

Drop comments if anything requires more blathering and have some fun with your macOS filesystem!

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.