[This article was first published on Mollie's Research Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I often find it beneficial to check to see whether or not a dataset is already loaded into R at the beginning of a file. This is particularly helpful when I’m dealing with a large file that I don’t want to load repeatedly, and when I might be using the same dataset with multiple R scripts or re-running the same script while making changes to the code.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
To check to see if an object with that name is already loaded, we can use the exists function from the base package. We can then wrap our read.csv command with an if statement to cause the file to only load if an object with that name is not already loaded.
< !-- HTML generated using hilite.me -->
if(!exists("largeData")) { largeData <- read.csv("huge-file.csv", header = TRUE) }
You will probably also find it useful to use the “colClasses” option of read.csv or read.table to help the file load faster and make sure your data are in the right format. For example:
< !-- HTML generated using hilite.me -->
if(!exists("largeData")) { largeData <- read.csv("huge-file.csv", header = TRUE, colClasses = c("factor", "integer", "character", "integer", "integer", "character")) }
—
This post is one part of my series on dealing with large datasets.
To leave a comment for the author, please follow the link and comment on their blog: Mollie's Research Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.