Site icon R-bloggers

Doh! I Could Have Had Just Used V8!

[This article was first published on rud.is » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An R user recently had the need to split a “full, human name” into component parts to retrieve first & last names. The full names could be anything from something simple like “David Regan” to more complex & diverse such as “John Smith Jr.”, “Izaque Iuzuru Nagata” or “Christian Schmit de la Breli”. Despite the face that I’m pretty good at searching GitHub & CRAN for R stuff, my quest came up empty (though a teensy part of me swears I saw this type of thing in a package somewhere). I did manage to find Python & node.js modules that carved up human names but really didn’t have the time to re-implement their functionality from scratch in R (or, preferably, Rcpp).

Rather than rely on the Python bridge to R (yuck) I decided to use @opencpu’s V8 package to wrap a part of the node.js humanparser module. If you’re not familiar with V8, it provides the ability to run JavaScript code within R and makes it possible to pass variables into JavaScript functions and get data back in return. All the magic happens via a JSON data passing & Rcpp wrappers (and, of course, the super-awesome code Jeroen writes).

Working with JavaScript in R is as simple as creating an instance of the JavaScript V8 interpreter, loading up the JavaScript code that makes the functions work:

library(V8)
 
ct <- new_context()
ct$source(system.file("js/underscore.js", package="V8"))
ct$call("_.filter", mtcars, JS("function(x){return x.mpg < 15}"))
 
#>                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
#> Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
#> Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
#> Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4

There are many more examples in the V8 vignette.

For humanparser I needed to use Underscore.js (it comes with V8) and a function from humanparser that I carved out to work the way I wanted it to. You can look at the innards of the package on github—specifically, this file (it’s really small)— and, to use the two new functions the package exposes it’s as simple as doing:

devtools::install_github("hrbrmstr/humanparser")
 
library(humanparser)
 
parse_name("John Smith Jr.")
 
#> $firstName
#> [1] "John"
#> 
#> $suffix
#> [1] "Jr."
#> 
#> $lastName
#> [1] "Smith"
#> 
#> $fullName
#> [1] "John Smith Jr."

or the following to convert a bunch of ’em:

full_names <- c("David Regan", "Izaque Iuzuru Nagata", 
                "Christian Schmit de la Breli", "Peter Doyle", "Hans R.Bruetsch", 
                "Marcus Reichel", "Per-Axel Koch", "Louis Van der Walt", 
                "Mario Adamek", "Ugur Tozsekerli", "Judit Ludvai" )
 
parse_names(full_names)
 
#> Source: local data frame [11 x 4]
#> 
#>    firstName     lastName                     fullName middleName
#> 1      David        Regan                  David Regan         NA
#> 2     Izaque       Nagata         Izaque Iuzuru Nagata     Iuzuru
#> 3  Christian  de la Breli Christian Schmit de la Breli     Schmit
#> 4      Peter        Doyle                  Peter Doyle         NA
#> 5       Hans   R.Bruetsch              Hans R.Bruetsch         NA
#> 6     Marcus      Reichel               Marcus Reichel         NA
#> 7   Per-Axel         Koch                Per-Axel Koch         NA
#> 8      Louis Van der Walt           Louis Van der Walt         NA
#> 9      Mario       Adamek                 Mario Adamek         NA
#> 10      Ugur   Tozsekerli              Ugur Tozsekerli         NA
#> 11     Judit       Ludvai                 Judit Ludvai         NA

Now, the functions in this package won’t win any land-speed records since we’re going from R to C[++] to JavaScript and back, passing JSON-converted data back & forth, so I pwnd @quominus into making a full Rcpp-based human, full-name parser. And, he’s nearly done! So, keep an eye on humaniformat since it will no doubt be in CRAN soon.

The real point of this post is that there are tons of JavaScript modules that will work well with the V8 package and let you get immediate functionality for something that might not be in R yet. You can prototype quickly (it took almost no time to make that package and you don’t even need to go that far), then optimize later. So, next time—if you can’t find some functionality directly in R—see if you can get by with a JavaScript shim, then convert to full R/Rcpp when/if you need to go into production.

If you’ve done any creative V8 hacks, drop a note in the comments!

To leave a comment for the author, please follow the link and comment on their blog: rud.is » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.