Here is Your Data

[This article was first published on R – William Doane, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It’s a common situation: you want to code and debug in R *and* leverage RMarkdown for a presentation or document.

The challenge: file paths.

Executing code in the console and from within a saved RMarkdown document typically requires distinct file paths to locate data files.

While you’re writing your code and debugging, you’ve probably got your source code open and are sending lines of code to the console to be evaluated. The code is evaluated with respect to the current working directory. In an RStudio project, that will be the top directory level of your project.

In a typical project, you’ll have several subdirectories:

- project
    - data
    - data-raw
    - output
    - R

So, a casual reference to the data directory might look like this:

read.csv("data/mtcars.csv")    # to execute in the console

If you write an RMarkdown document, you’ll save that in your R directory. While you’re debugging code, you’ll be sending lines of code to the console for evaluation, as above. So, the project’s top level directory is used as a starting point.

However, when you knit the document (or spin, or compose notebook… any of the aliases to knitr::knit() and associated functions), the document is evaluated in its own environment. From the perspective of knitr, the base directory is the directory in which you saved the .R, .md, or .Rmd document. Assuming you saved your RMarkdown script in R/, then knitting that document with the same code as above 

read.csv("data/mtcars.csv")

looks for R/data/mtcars.csv within your project directory, which doesn’t exist.

The common, but not optimal, solution is to tell the knitted document to look up one directory level before looking for the data directory:

read.csv("../data/mtcars.csv") # to knit

.. is geek-speak for “parent directory” AKA “up one directory level”. If you needed to look up two levels, you would refer to ../../data/mtcars.csv

That’s great for knitting, but when you want to debug your code interactively in the console, you have to remove the ../ portion again. What a mess!

The here package addresses this issue. here provides a function, called here(), that returns a file path based on where your project’s top level directory is. So, when you evaluate the following line in the console

read.csv(here::here("data", "mtcars.csv"))

here looks for an .Rproj file or other indicators of where your current project is located and constructs a file path to that top level directory, plus a data subdirectory, plus a file named mtcars.csv in that directory:

[1] "/Users/wdoane/Documents/Repositories/example_project/data/mtcars.csv"

Equivalently, you could first library() or require() here:

library(here)
 
dir.create(here("data"))
write.csv(mtcars, here("data", "mtcars.csv"))
read.csv(here("data", "mtcars.csv"))

To install the current release version of the here package

install.packages("here")

or to install the development version

devtools::install_github("r-lib/here")

For more information about here, visit https://github.com/jennybc/here_here

To leave a comment for the author, please follow the link and comment on their blog: R – William Doane.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)