Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The jsonlite package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.13 appeared on CRAN which is the third release in a relatively short period focusing on performance optimization.
Fast number formatting
Version 0.9.11 and 0.9.12 had already introduced majors speedup by porting critical bottlenecks to C code and switching to a better JSON parser. The current release focuses on number formatting and incorporates C code from modp_numtoa
which is several times faster than as.character
, formatC
or sprintf
for converting doubles and integers to strings (your mileage may vary depending on platform and precision).
library(ggplot2) nrow(diamonds) # [1] 53940 system.time(jsonlite::toJSON(diamonds, dataframe = "row")) # user system elapsed # 0.319 0.007 0.325 system.time(jsonlite::toJSON(diamonds, dataframe = "col")) # user system elapsed # 0.073 0.002 0.075
Using the same benchmark from previous posts, time to convert the diamonds
data to row-based json has gone down from 0.619s to 0.325s on my machine (about 2x speedup from jsonlite 0.9.12), and converting to column-based json has gone down from 0.330s to 0.075s (about 4x speedup).
Comparing to other JSON packages
When comparing JSON packages, it should be noted that the comparsion is never entirely fair because different packages use different settings and defaults for missing values, number of digits, etc. Both rjson
and RJSONIO
only support the column based format for encoding data frames. Using their default settings:
system.time(rjson::toJSON(diamonds)) # user system elapsed # 0.279 0.004 0.281 system.time(RJSONIO::toJSON(diamonds)) # user system elapsed # 0.918 0.027 0.944
For this particular dataset, jsonlite is about 3.5x faster than rjson
and about 12x faster than RJSONIO
(on my machine) to generate column-based JSON. These differences are relatively large because 7 out of the 10 columns in the diamonds
dataset are numeric.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.