Faster arrays and matrices in jsonlite 0.9.20
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Yesterday a new version of the jsonlite package was released to CRAN. This update includes no new features, it only introduces performance optimizations.
Large Matrices
The jsonlite package was already highly optimized for converting vectors and data frames to json. However Gregory Jefferis and Duncan Murdoch had found that conversion of tall matrices as used by rglwidget was slower than expected.
It turned out this was indeed an edge case that I had overlooked. The new version of jsonlite fixes this problem and matrix conversion should be about 200 times faster than before. Technical details follow below; first a benchmark:
# Old version!
> system.time(j<-toJSON(matrix(1L, ncol = 3, nrow = 50000)))
user system elapsed
4.715 0.015 4.729
# New version!
> system.time(j<-toJSON(matrix(1L, ncol = 3, nrow = 50000)))
user system elapsed
0.022 0.002 0.023
This artificial example (every field has the number 1) highlights the improvement. The relative improvement might be less for matrices with actual data because of additional time spent on number formatting double/integer values (which was already optimized in jsonlite a while ago).
Technical Details
So what was the problem? The previous version of jsonlite had an elegant solution that would recurse through the dimensions of a matrix/array and apply json conversion on each of its elements. E.g. for a matrix (2D array) it would convert each row to json, and then combine the results. However it turns out that the apply
call below is really slow.
# Technical example, don't use this code !
x <- matrix(1L, ncol = 3, nrow = 50000)
rows <- apply(x, 1, jsonlite:::asJSON)
json <- jsonlite:::collapse(rows, indent = NA)
The new version exploits the fact that matrices and arrays are homogenous (i.e. all elements have the same type). It first removes the dimensions from the array using c(x)
and converts all of the individual elements to json with a single call to asJSON
. This results in a significant speedup because asJSON
is only called once rather than n
times.
# Technical example, don't use this code !
str <- jsonlite:::asJSON(c(x), collapse = FALSE)
dim(str) <- dim(x)
rows <- apply(str, 1, jsonlite:::collapse, indent = NA)
json <- jsonlite:::collapse(rows, indent = NA)
Things get a bit more complicated for higher dimensional arrays, especially with toJSON(x, pretty = TRUE)
but this illustrates the core issue.
You might be thinking: can we avoid apply
alltogether? Yes! For the important case of 2 dimensional arrays jsonlite has a complete C implementation which makes toJSON
on matrices is extra fast. For higher dimensional arrays it currently still uses the solution above, which performs quite well. We might be able to further optimize this case by porting this to C as well, but working with high dimensional arrays in C makes my head hurt.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.