Function to Read NDJSON (Newline Deliminated JSON) Files
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Notice
jsonlite has a stream_in() function that works much better and faster. Do not use this
I ended up writing this while working on a web scraper for a lyrics website and thought it might be useful to some people. This is still a generic solution, but it probably won’t be of help unless you are working with the ndjson files and don’t want to rely on unnecessary libraries. The only library it needs is jsonlite, which is a fantastic library.
library("jsonlite")
As far as the function, it is pretty simple and won’t really know how you want nested values to work so you may have to modify the function to fit your needs.
#The function read_ndjson = function(filename) { #Used to create matrix upon reading first row line <- 0; #Passed In Filename con = file(filename, "r") while(TRUE) { #Here we go json = readLines(con, n = 1) #No lines left if(length(json) == 0) { break } #Simplify so a vector is returned instead of list row <- fromJSON(json, simplifyVector = TRUE) #Create initial matrix if(line == 0) { ndjson <- matrix(nrow = 0,ncol = length(row)) } #This could be done better ndjson <- rbind(ndjson,row) line <- line + 1 } close(con) return(ndjson) } #Calling the function discography <- data.frame(read_ndjson("./log.json"))
While this might not come in handy as often as some other functions, I am keeping this in my toolchain. Still working on converting the lyrics scraper to R (it is currently written in Node), but if there is interest I can post that as well.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.