Effortlessly Read Rectangular Data: R Package `readit` 1.0.0 Released on CRAN
[This article was first published on Another Blog About R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Another R package designed out of frustration, `readit` is now available. What follows is the README that you can find on Github, and verison 1.0.0 of readit is now available on CRAN. Please feel free to submit requests, bug reports, etc.!Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
readit()
may be the only data-read function you ever need; by wrapping other popular reader packages, like readr, readxl, haven, jsonlite, readit
provides one self-titled function to read almost anything that isn’t formatted like hot garbage. If you have faith that the underlying data is of modest quality, and don’t care how it’s delimited, or what its file extension suggests, then readit
is for you.This package was inspired by a handover at work; I took over as Maintainer for a package that dealt with a lot of disparate file extensions, and quickly became frustrated with trying to keep track of which filename was delimited in what way. “Why can’t I just… ***f@!#ing read it?!***” And lo,
readit
was born!Features
readit
is a pretty straightforward R package. It only exports one function, readit()
, which wraps most of the reader functions in readr, readxl haven, and jsonlite. You can pass any arguments that you would normally pass to those functions, to readit()
, as well.readit()
uses some basic heuristics based on the file extension to call the appropriate read function, and if it’s too ambigious (like .txt
files), readit()
will perform some commonly-implemented checks to guess the correct delimiter. readit()
will always print out what file type it guessed (in nice, bold, green console text, via crayon, as a sanity check, and throw an error if the file path you give it is parsed and determined to be too messy to deal with automatically. For example, say you have some .txt
file that you receive from a client each month, and it’s delimited differently every time (because that’s how it goes). Instead of inspecting it with four or five different functions first, you can just call readit()
on it to pass it to readr
‘s… readers:> readit("path/to/frustrating/file.txt") File guessed to be pipe-delimited ("path/to/frustrating/file.txt") Parsed with column specification: cols( testheader1 = col_character(), testheader2 = col_character(), testheader3 = col_character(), testheader4 = col_character(), testheader5 = col_character(), testheader6 = col_character() ) # A tibble: 5 x 5 testheader1 testheader2 testheader3 testheader4 testheader5 <chr> <chr> <chr> <chr> <chr> 1 testdata11 testdata12 testdata13 testdata14 testdata15 2 testdata21 testdata22 testdata23 testdata24 testdata25 3 testdata31 testdata32 testdata33 testdata34 testdata35 4 testdata41 testdata42 testdata43 testdata44 testdata45 5 testdata51 testdata52 testdata53 testdata54 testdata55
Huzzah! It turns out that someone replaced all the delimiters with pipes (
|
), but with readit
, that’s no problem! Just throw it into the great maw, and watch as the correct data comes back out.What about if the same file becomes a sneaky tab-delimited file next month?
> readit("path/to/frustrating/file.txt") File guessed to be tab-delimited ("path/to/frustrating/file.txt") Parsed with column specification: cols( testheader1 = col_character(), testheader2 = col_character(), testheader3 = col_character(), testheader4 = col_character(), testheader5 = col_character() ) # A tibble: 6 x 5 testheader1 testheader2 testheader3 testheader4 testheader5 <chr> <chr> <chr> <chr> <chr> 1 testdata11 testdata12 testdata13 testdata14 testdata15 2 testdata21 testdata22 testdata23 testdata24 testdata25 3 testdata31 testdata32 testdata33 testdata34 testdata35 4 testdata41 testdata42 testdata43 testdata44 testdata45 5 testdata51 testdata52 testdata53 testdata54 testdata55 6 testdata61 testdata62 testdata63 testdata64 testdata65
Nope, no problem:
readit()
picked it up just fine, including the newest data.What if your client starts storing the same data in Excel files, instead?
> readit("path/to/frustrating/file.xlsx") File guessed to be xls/xlsx (Excel) ("path/to/frustrating/file.xlsx") Parsed with column specification: cols( testheader1 = col_character(), testheader2 = col_character(), testheader3 = col_character(), testheader4 = col_character(), testheader5 = col_character(), testheader6 = col_character() ) # A tibble: 6 x 5 testheader1 testheader2 testheader3 testheader4 testheader5 <chr> <chr> <chr> <chr> <chr> 1 testdata11 testdata12 testdata13 testdata14 testdata15 2 testdata21 testdata22 testdata23 testdata24 testdata25 3 testdata31 testdata32 testdata33 testdata34 testdata35 4 testdata41 testdata42 testdata43 testdata44 testdata45 5 testdata51 testdata52 testdata53 testdata54 testdata55 6 testdata61 testdata62 testdata63 testdata64 testdata65
readit()
has you covered. What if that data is on the second Excel sheet, though? Just pass sheet = 2
to readit()
, just like you would to read_excel()
:> readit("path/to/frustrating/file.xlsx", sheet = 2) File guessed to be xls/xlsx (Excel) ("path/to/frustrating/file.xlsx") Parsed with column specification: cols( testheader1 = col_character(), testheader2 = col_character(), testheader3 = col_character(), testheader4 = col_character(), testheader5 = col_character(), testheader6 = col_character() ) # A tibble: 6 x 5 testheader1 testheader2 testheader3 testheader4 testheader5 <chr> <chr> <chr> <chr> <chr> 1 testdata11 testdata12 testdata13 testdata14 testdata15 2 testdata21 testdata22 testdata23 testdata24 testdata25 3 testdata31 testdata32 testdata33 testdata34 testdata35 4 testdata41 testdata42 testdata43 testdata44 testdata45 5 testdata51 testdata52 testdata53 testdata54 testdata55 6 testdata61 testdata62 testdata63 testdata64 testdata65
What if your client is a bunch of academics, and they send you the same data, but in SAS format?
> readit("path/to/frustrating/file.sas7bdat") File guessed to be .sas7b*at (SAS) ("path/to/frustrating/file.sas7bdat") Parsed with column specification: cols( testheader1 = col_character(), testheader2 = col_character(), testheader3 = col_character(), testheader4 = col_character(), testheader5 = col_character(), testheader6 = col_character() ) # A tibble: 6 x 5 testheader1 testheader2 testheader3 testheader4 testheader5 <chr> <chr> <chr> <chr> <chr> 1 testdata11 testdata12 testdata13 testdata14 testdata15 2 testdata21 testdata22 testdata23 testdata24 testdata25 3 testdata31 testdata32 testdata33 testdata34 testdata35 4 testdata41 testdata42 testdata43 testdata44 testdata45 5 testdata51 testdata52 testdata53 testdata54 testdata55 6 testdata61 testdata62 testdata63 testdata64 testdata65
Still no worries (
readit
will pick up both .sas7bdat
and .sas7bcat
extensions)In fact, readit is able to read all of the following data, so long as they have a file extension, and will take any arguments you would want to pass to the underlying functions:
.txt
(but not fixed-width, for obvious reasons).csv
.xls
/.xlsx
.sas7bdat
/.sas7bcat
(SAS files).dta
(Stata files).sav
/.por
(SPSS files).json
(JSON arrays, which are parsed into data frames, like in loggit
Future work
- Add support for reader functions from the foreign package.
Installation
You can install the latest CRAN release of
readit
via install.packages("readit")
.Or, to get the latest development version from GitHub —
Via devtools:
devtools::install_github("ryapric/readit")
Or, clone & build from source:
cd /path/to/your/repos git clone https://github.com/ryapric/readit.git readit R CMD INSTALL readit
To use the most recent development version of
readit
in your own package, you can include it in your Remotes:
field in your DESCRIPTION file:Remotes: github::ryapric/readit
Note that packages being submitted to CRAN cannot have a
Remotes
field. Refer here for more info.License
MIT @ Ryan J. Price, 2018.
To leave a comment for the author, please follow the link and comment on their blog: Another Blog About R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.