jsonlite 0.9.12: now even lighter and faster
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The jsonlite package implements a robust, high performance JSON parser and generator for R, optimized for statistical data and the web. This week version 0.9.12 appeared on CRAN which includes a completely rewritten json parser and more optimized C code for json generation. The new parser is based on yajl which is smaller and faster than libjson, and much easier to compile.
Error handling
My favorite feature of yajl is that it gives helpful error messages when parsing invalid JSON, for example:
fromJSON('[1,2,falsse,4]') # Error in parseJSON(txt) : lexical error: invalid string in json text. # [1,2,falsse,4] # (right here) ------^ fromJSON('["foo", "blanbla"]') # Error in parseJSON(txt) : lexical error: invalid character inside string. # ["foo", "bla bla"] # (right here) ------^ fromJSON('[1,2,3,4] {}') # Error in parseJSON(txt) : parse error: trailing garbage # [1,2,3,4] {} # (right here) ------^
This makes debugging much easier, especially when dealing fast changing dynamic data from the web.
Unicode parsing
The yajl parser always correctly converts escaped unicode sequences into UTF-8 characters:
fromJSON('["\u5bffu53f8","Z\u00fcrich"]') # [1] "寿司" "Zürich"
Escaped unicode was already supported in the previous version of jsonlite, however it was expensive and not enabled by default. With yajl we get this for free 🙂
Integer parsing
Another cool feature is that yajl parses numbers into integers when possible:
class(fromJSON('[13,14,15]')) # [1] "integer"
Performance
Performance of both parsing and generating JSON has again tremendously improved in this version. Some benchmarks:
library(jsonlite) library(microbenchmark) data(diamonds, package="ggplot2") json_rows <- toJSON(diamonds) json_columns <- toJSON(diamonds, dataframe = "columns") microbenchmark( toJSON(diamonds), toJSON(diamonds, dataframe = "columns"), fromJSON(json_rows), fromJSON(json_columns), times=10 ) # Unit: milliseconds # expr min lq median uq max neval # toJSON(diamonds) 587.6984 591.3231 619.1590 630.3588 661.5118 10 # toJSON(diamonds, dataframe = "columns") 317.6793 325.3809 330.6444 339.9898 343.7466 10 # fromJSON(json_rows) 890.9832 899.3334 939.3230 979.6338 1059.9770 10 # fromJSON(json_columns) 188.5764 201.8463 238.1272 279.7607 293.1195 10
If we compare this to the previous blog post we can see that generating JSON to row-based data frames (the default) is approx 2x faster than the previous version. Parsing row-based json is about 2.5x faster, and parsing column-based json is almost 5x faster!
Streaming JSON
Version 0.9.12 introduces some cool streaming functionality. This is a topic in itself and I will blog about this later in the week. Have a look at examples from the stream_in
and stream_out
manual pages till then.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.