How to import a subset of a (too huge) csv file
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Sometimes people create csv files that are just too huge to upload them into your R session while most of the times you just need a subset of this data set. Recently I tapped into this problem and first I tried to import the whole file with functions like fread or the classic read.csv but this did not help much as the file was just too big and my computer failed to import it. With the awesome read.csv.sql function from the sqldf package I found a good way to solve my problem. This function enables you to use SQL statements within the import function which make it possible to select only a subset of the file to reduce the import size.
mydata = read.csv.sql("mydata.csv", sql = "select * from file where City = '\"Hamburg\"' ", sep = ";")
The code above loads only those lines of the file in where the city is Hamburg. I still had trouble to with the encoding that is why I used this ugly string with backlashes in the SQL statement. I will leave it like this as you might having the same problem.
Make sure you check out my Github for other data driven projects.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.