Doh! I Could Have Had Just Used V8!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
An R user recently had the need to split a “full, human name” into component parts to retrieve first & last names. The full names could be anything from something simple like “David Regan” to more complex & diverse such as “John Smith Jr.”, “Izaque Iuzuru Nagata” or “Christian Schmit de la Breli”. Despite the face that I’m pretty good at searching GitHub & CRAN for R stuff, my quest came up empty (though a teensy part of me swears I saw this type of thing in a package somewhere). I did manage to find Python & node.js modules that carved up human names but really didn’t have the time to re-implement their functionality from scratch in R (or, preferably, Rcpp).
Rather than rely on the Python bridge to R (yuck) I decided to use @opencpu’s V8 package to wrap a part of the node.js humanparser module. If you’re not familiar with V8, it provides the ability to run JavaScript code within R and makes it possible to pass variables into JavaScript functions and get data back in return. All the magic happens via a JSON data passing & Rcpp wrappers (and, of course, the super-awesome code Jeroen writes).
Working with JavaScript in R is as simple as creating an instance of the JavaScript V8 interpreter, loading up the JavaScript code that makes the functions work:
library(V8) ct <- new_context() ct$source(system.file("js/underscore.js", package="V8")) ct$call("_.filter", mtcars, JS("function(x){return x.mpg < 15}")) #> mpg cyl disp hp drat wt qsec vs am gear carb #> Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4 #> Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 #> Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4 #> Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 #> Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4 |
There are many more examples in the V8 vignette.
For humanparser
I needed to use Underscore.js (it comes with V8) and a function from humanparser
that I carved out to work the way I wanted it to. You can look at the innards of the package on github—specifically, this file (it’s really small)— and, to use the two new functions the package exposes it’s as simple as doing:
devtools::install_github("hrbrmstr/humanparser") library(humanparser) parse_name("John Smith Jr.") #> $firstName #> [1] "John" #> #> $suffix #> [1] "Jr." #> #> $lastName #> [1] "Smith" #> #> $fullName #> [1] "John Smith Jr." |
or the following to convert a bunch of ’em:
full_names <- c("David Regan", "Izaque Iuzuru Nagata", "Christian Schmit de la Breli", "Peter Doyle", "Hans R.Bruetsch", "Marcus Reichel", "Per-Axel Koch", "Louis Van der Walt", "Mario Adamek", "Ugur Tozsekerli", "Judit Ludvai" ) parse_names(full_names) #> Source: local data frame [11 x 4] #> #> firstName lastName fullName middleName #> 1 David Regan David Regan NA #> 2 Izaque Nagata Izaque Iuzuru Nagata Iuzuru #> 3 Christian de la Breli Christian Schmit de la Breli Schmit #> 4 Peter Doyle Peter Doyle NA #> 5 Hans R.Bruetsch Hans R.Bruetsch NA #> 6 Marcus Reichel Marcus Reichel NA #> 7 Per-Axel Koch Per-Axel Koch NA #> 8 Louis Van der Walt Louis Van der Walt NA #> 9 Mario Adamek Mario Adamek NA #> 10 Ugur Tozsekerli Ugur Tozsekerli NA #> 11 Judit Ludvai Judit Ludvai NA |
Now, the functions in this package won’t win any land-speed records since we’re going from R to C[++] to JavaScript and back, passing JSON-converted data back & forth, so I pwnd @quominus into making a full Rcpp-based human, full-name parser. And, he’s nearly done! So, keep an eye on humaniformat since it will no doubt be in CRAN soon.
The real point of this post is that there are tons of JavaScript modules that will work well with the V8 package and let you get immediate functionality for something that might not be in R yet. You can prototype quickly (it took almost no time to make that package and you don’t even need to go that far), then optimize later. So, next time—if you can’t find some functionality directly in R—see if you can get by with a JavaScript shim, then convert to full R/Rcpp when/if you need to go into production.
If you’ve done any creative V8 hacks, drop a note in the comments!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.