[This article was first published on K & L Fintech Modeling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post presents basic R code snippets to read files with given file extensions such as csv or txt. This is simple but very useful when it comes to the case where there are too many files to read manually. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If we have too many (i.e. 1000 files) csv files or its variants, it is impossible to read these files one by one manually.
For example, let’s assume that there are following 6 files in a target directory. The first four files (csv, CSV, txt, TXT) are the files that we want to read. The contents of these files are straightforward because the output will show these contents in the later.
- USD.CSV
- EUR.csv
- CNY.TXT
- AUD.txt
- CNY2.ttxt
- USD.CCSV
In this case, we can use list.files() R function to read these files with some certain file extensions.
list.files() function
list.files() is a built-in R function which returns a list of names of files with a given pattern.
1 2 3 | list.files(path, pattern=“\\.(csv|txt)$”, ignore.case = TRUE, full.names = FALSE) | cs |
In the above R command, “\\.(csv|txt)$” pattern specifies that 1) it is applied at the end of file name($), 2) multiple file extensions such as csv or txt file ((csv|txt)) are allowed but not for similar extensions such as ccsv or ttxt(\\.). csv and CSV or txt and TXT are allowed because case sensitivity is ignored (ignore.case = TRUE).
R code
The following R code is easy and self-contained: 1) reads each csv (CSV) or txt (TXT) files and make each data.frame separately and 2) reads and collects them into one data.frame.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | #========================================================# # Quantitative ALM, Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee # # https://kiandlee.blogspot.com #——————————————————–# # Basic R : read all csv files when these are so many #========================================================# graphics.off() # clear all graphs rm(list = ls()) # remove all files from your workspace # working directory setwd(“D:/SHLEE/blog/R/many_csv”) # directory where csv files are located path<– file.path(getwd()) #——————————————————- # make a list of all file names with csv or txt ext. #——————————————————- # $ : end of file name # (csv|txt) : multiple file extentions # \\. : avoid unwanted cases such as .ccsv #——————————————————- v.filename <– list.files(path, pattern=“\\.(csv|txt)$”, ignore.case = TRUE, full.names = FALSE) #——————————————————- # Test 1) read and make each data.frame #——————————————————- for(fn in v.filename) { df.each = read.csv(fn) print(fn); print(df.each) } #——————————————————- # Test 2) read and collect into one data.frame #——————————————————- df.all = do.call(rbind, lapply(v.filename, function(x) read.csv(x))) print(df.all) | cs |
Output
We can find that only 4 files with correct file extensions are read while 2 unwanted files (.CCSV and .ttxt) are ignored.
This R code is efficient and useful especially when there are too many files to read. \(\blacksquare\)
To leave a comment for the author, please follow the link and comment on their blog: K & L Fintech Modeling.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.