Statistics Sunday: Reading and Creating a Data Frame with Multiple Text Files
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
First Statistics Sunday in far too long! It’s going to be a short one, but it describes a great trick I learned recently while completing a time study for our exams at work.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
To give a bit of background, this time study involves analzying time examinees spent on their exam and whether they were able to complete all items. We’ve done time studies in the past to select time allowed for each exam, but we revisit on a cycle to make certain the time allowed is still ample. All of our exams are computer-administered, and we receive daily downloads from our exam provider with data on all exams administered that day.
What that means is, to study a year’s worth of exam data, I need to read in and analyze 365(ish – test centers are generally closed for holidays) text files. Fortunately, I found code that would read all files in a particular folder and bind them into a single data frame. First, I’ll set the working directory to the location of those files, and create a list of all files in that directory:
setwd(“Q:/ExamData/2018”)
filelist <- list.files()
For the next part, I’ll need the data.table library, which you’ll want to install if you don’t already have it:
library(data.table)
Exams2018 <- rbindlist(sapply(filelist, fread, simplify = FALSE, use.names = TRUE, idcol = "FileName")
Now I have a data frame with all exam data from 2018, and an additional column that identifies which file a particular case came from.
What if your working directory has more files than you want to read? You can still use this code, with some updates. For instance, if you want only the text files from the working directory, you could add a regular expression to the list.files() code to only look for files with “.txt” extension:
list.files(pattern = “\\.txt$”)
If you’re only working with a handful of files, you can also manually create the list to be used in the rbindlist function. Like this:
filelist <- c("file1.txt", "file2.txt", "file3.txt")
That’s all for now! Hope everyone has a Happy Thanksgiving!
To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.