Site icon R-bloggers

RcppSimdJson 0.1.0: Now on Windows, With Parsers and Faster Still!

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A smashing new RcppSimdJson release 0.1.0 containing several small updates to upstream simdjson (now at 0.4.6) in part triggered by very excisting work by Brendan who added actual parser from file and string—and together with Daniel upstream worked really hard to make Windows builds as well as complete upstream tests on our beloved (ahem) MinGW platform possible. So this version will, once the builders have caught up, give everybody on Windows a binary—with a JSON parser running circles around the (arguably more feature-rich and possibly easier-to-use) alternatives. Dave just tweeted a benchmark snippet by Brendan, the full set is at the bottom our issue ticket for this release.

RcppSimdJson wraps the fantastic and genuinely impressive simdjson library by Daniel Lemire and collaborators, which in its upstream release 0.4.0 improved once more (also see the following point releases). Via very clever algorithmic engineering to obtain largely branch-free code, coupled with modern C++ and newer compiler instructions, it results in parsing gigabytes of JSON parsed per second which is quite mindboggling. The best-case performance is ‘faster than CPU speed’ as use of parallel SIMD instructions and careful branch avoidance can lead to less than one cpu cycle use per byte parsed; see the video of the recent talk by Daniel Lemire at QCon (which was also voted best talk).

As mentioned, this release expands the reach of the package to Windows, and adds new user-facing functions. A big thanks for most of this is owed to Brendan, so buy him a drink if you run across him. The full NEWS entry follows.

Changes in version 0.1.0 (2020-07-07)

  • Upgraded to simdjson 0.4.1 which adds upstream Windows support (Dirk in #27 closing #26 and #14, plus extensive work by Brendan helping upstream with mingw tests).

  • Upgraded to simdjson 0.4.6 with further upstream improvements (Dirk in #30).

  • Change Travis CI to build matrix over g++ 7, 8, 9, and 10 (Dirk in #31; and also Brendan in #32).

  • New JSON functions fparse and fload (Brendan in #32) closing #18 and #10).

Courtesy of CRANberries, there is also a diffstat report for this release.

For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.