Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Every campaign cycle I usually do similar things, go to a repository, download a bounce of data, merge and store them to an existing RData file for posterior analysis. I’ve already wrote about this topic some time ago, but this time I think my script became simpler.
Set the Directory
Let’s assume you’re not in the same directory of your files, so you’ll need to set R to where the population of files resides.
setwd("~/Downloads/consulta_cand_2014")
Getting a List of files
Next, it’s just a matter of getting to know your files. For this, the list.files() function is very handy, and you can see the file names right-way in your screen. Here I’m looking form those “txt” files, so I want my list of files exclude everything else, like pdf, jpg etc.
files <- list.files(pattern= '\.txt$')
Sometimes you may find empty objects that may prevent the script to run successfully against them. Thus, you may want to inspect the files beforehand.
info = file.info(files) empty = rownames(info[info$size == 0, ])
Moreover, in case you have the same files in more than one format, you may want to filter them like in the following:
CSVs <-list.files(pattern='csv') TXTs <- list.files(pattern='txt') mylist <- CSVs[!CSVs %in% TXTs]
Stacking files into a dataframe
The last step is to iterate “rbind” through the list of files in the working directory putting all them together.
Notice that in the script below I’ve included some extra conditions to avoid problems reading the files I have. Also, this assumes all the files have the same number of columns, otherwise “rbind” won’t work. In this case you may need to replace “rbind” by “smartbind” from gtools package.
cand_br <- do.call("rbind",lapply(files, FUN=function(files){read.table(files, header=FALSE, sep=";",stringsAsFactors=FALSE, fileEncoding="cp1252", fill=TRUE,blank.lines.skip=TRUE) }))
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.