Engaging the tidyverse Clean Slate Protocol
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I caught the 0.7.0 release of dplyr
on my home CRAN server early Friday morning and immediately set out to install it since I’m eager to finish up my sergeant
package and get it on CRAN. “Tidyverse” upgrades aren’t trivial for me as I tinker quite a bit with the tidyverse and create packages that depend on various components. The sergeant
package provides — amongst other things — a dplyr
back-end for Apache Drill, so it has more tidyverse tendrils than other bits of code I maintain.
macOS binaries weren’t available yet (it generally takes 24-48 hrs for that) so I did an install.packages("dplyr", type="source")
and was immediately hit with gcc 7 compilation errors. This seemed odd, but switching back to clang
worked fine.
I, then, proceeded to run chunks in an Rmd I’m working on and hit “Encoding” errors on mutate()
calls. Not having time to debug further I reverted to 0.5.0 of dplyr
and went about my day and promised the tidyverse maintainers that I’d work on a reproducible example after work.
I made R data files from the data frames that were tossing errors and extracted & tweaked a code snippet that consistently generated the error and created a rocker container on one of my linux boxes to validate that this was an error and a cross-platform one. The rocker container used a full fresh-from-source copy of the tidyverse including dplyr
0.7.0. The code worked and no error was generated, so I immediately suspected package rot on my main dev macOS box.
Now, my situation is complicated by an insanely hasty migration to macOS 10.13β1 (I refuse to use the Apple macOS catchy names anymore since the most recent one is just silly) and a move to the gcc 7 toolchain (initially prompted to both get rJava
working nicely and reproduce some CRAN noted errors with some packages). Further complications were also created by many invocations of install_github()
of various packages regularly overwriting bits of the tidyverse over the past few weeks since the R 3.4.0 release. In other words, the integrity of the “tidyverse” was in serious question on my system and it was time for the Clean Slate Protocol.
Rather than itemize package versions and surgically nipping and tucking, I opted to use packrat
to get to my desired end-state of a full-integrity tidyverse install. There are many ways to do this. Feel free to “one-up” me and show your l33t method in the comments. This one will likely be accessible to most — if not all — R users.
I started a new RStudio project in a new session and told it to use packrat
. In the new project console, I did install.packages("tidyverse", type="source")
and let it go for many minutes. I, then, navigated to the packrat subdirectory where the 3.4 package binaries are housed (just follow the project packrat
tree down to the R version directory) and moved all 51 packages (yes, 51 O_o) to the main R library path (which you can figure out by running .libPaths()
in any non-packrat
-maintained project).
After doing that, I fired up the originally failing Rmd and everything worked fine. ?
I don’t do the Clean Slate Protocol too often (we all get to for new R dot-releases) but it came in handy this time. If you run into errors when trying to get the new dplyr
working, you may benefit from the Clean Slate Protocol as well.
If you haven’t seen the changes in 0.6.0/0.7.0 you should check them out and give it a go.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.