Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As someone who measures all kinds of things on the internet as part of his $DAYJOB, I can say with some authority that huge swaths of organizations are using cloud-services such as Google Apps, Dropbox and Office 365 as part of their business process workflows. For me, one regular component that touches the “cloud” is when I have to share R-generated charts with our spiffy production team for use in reports, presentations and other general communications.
These are typically project-based tasks and data science team members typically use git- and AWS-based workflows for gathering data, performing analyses and generating output. While git is great at sharing code and ensuring the historical integrity of our analyses, we don’t expect the production team members to be or become experts in git to use our output. They live in Google Drive and thanks to the googledrive
We use “R projects” to organize things and either use spinnable R scripts or R markdown documents inside those projects to gather, clean and analyze data.
For 2019, we’re using new, work-specific R markdown templates that have one new YAML header parameter:
params: gdrive_folder_url: "https://drive.google.com/drive/u/2/SOMEUSELESSHEXSTRING"
which just defines the Google Drive folder URL for the final output directory in the
Next is a new pre-configured knitr
chunk call at the start of these production chart-generating documents:
knitr::opts_chunk$set( message = FALSE, warning = FALSE, dev = c("png", "cairo_pdf"), echo = FALSE, fig.retina = 2, fig.width = 10, fig.height = 6, fig.path = "prod/charts/" )
since production team want PDF so they can work with it in their tools and — in our testing — cairo_pdf
produces the best/most consistent output, but PNGs show up better in the composite HTML documents so we use that order deliberately.
The real change is the consistent naming of the fig.path
directory. By doing this, all we have to do is add a few lines (again, automatically generated) to the bottom of the document to have all the output automagically go to the proper Google Drive folder:
# Upload to production ---------------------------------------------------- googledrive::drive_auth() # locate the folder gdrive_prod_folder <- googledrive::as_id(params$gdrive_folder_url) # clean it out gdrls <- googledrive::drive_ls(gdrive_prod_folder) if (nrow(gdrls) > 0) { dplyr::pull(gdrls, id) %>% purrr::walk(~googledrive::drive_rm(googledrive::as_id(.x))) } # upload new list.files(here::here("prod/charts"), recursive = TRUE, full.names = TRUE) %>% purrr::walk(googledrive::drive_upload, path = gdrive_prod_folder)
Now, we never have to remember to drag documents into a browser and don’t have to load invasive Google applications onto our systems to ensure the right folks have the right files at the right time. We just have to use the new R markdown document type to generate a starter analysis document with all the necessary boilerplate baked in. Plus, .httr-oauth
file is automatically ignored in .gitignore
so there’s no information leakage to shared git repositories.
FIN
If you want to experiment with this, you can find a pre-configured template in the markdowntemplates
package over at sr.ht, GitLab, or GitHub.
If you install the package you’ll be able to select this output type right from the new document dialog:
and new template will be ready to go with no copying, cutting or pasting.
Plus, since the Google Drive folder URL is an R markdown parameter, you can also use this in script automation (provided that you’ve wired up oauth correctly for those scripts).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.