Vectorizing IPv4 Address Conversions – Part 1
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Our previous post showed how to speed up the conversion of IPv4 addresses to/from integer format by taking advantage of a simple Rcpp
wrapper to “boosted” native functions. However, to convert more than one IP address, you need to stick those functions into one of the R *apply
functions, which does the job, but is not an optimal solution. Ideally, it would be advantageous to be able to pass in a vector (with more than one element) of character IP addresses or a vector of integer format IP addresses and know that the function will “just work”.
In this post we’ll introduce a shortcut method of vectorization with the Vectorize()
function. Then, in the second and final part of the series, we’ll look at implementing the necessary code at the Rcpp
layer to perform the vectorization at the C++-level and show some benchmarks for each method.
The Vectorize() Shortcut
At the end of our previous exercise, we had two functions: rinet_pton()
& rinet_ntop()
. Each took a single argument (the former a single element character vector and the latter a single element numeric vector) and returned a single element vector as a result. Let’s vectorize each one using the Vectorize()
function:
# the following code assumes you've already done the "sourceCpp" in the prev article ip_to_long <- Vectorize(rinet_pton) long_to_ip <- Vectorize(rinet_ntop)
Yes, that’s all it takes. Now we can pass in a vector of one or more elements and each function will return a vector of the same size as a result. The proof is in the output, so let’s give them a go, first with the original single-element vector use case:
# try a single IP address first ip_to_long("10.0.0.0") ## 10.0.0.0 ## 167772160 long_to_ip(167772160) ## [1] "10.0.0.0"
So far, so good except that the default behavior (in Vectorize()
) of producing a named vector when a character vector is passed in is probably not what we really want, so we’ll tweak the call to Vectorize()
for each function:
ip_to_long <- Vectorize(rinet_pton, USE.NAMES=FALSE) long_to_ip <- Vectorize(rinet_ntop, USE.NAMES=FALSE) ip_to_long("10.0.0.0") ## [1] 167772160 long_to_ip(167772160) ## [1] "10.0.0.0"
Now, let’s test it with more than one element:
srcIp <- c("146.178.58.99", "174.5.172.152", "146.178.58.99", "213.186.42.8", "146.178.58.99", "170.138.152.142", "170.138.152.142", "174.5.172.152", "146.178.58.99", "213.186.42.8") srcInt <- c(2461153891, 2919607448, 2461153891, 3585747464, 2461153891, 2861209742, 2861209742, 2919607448, 2461153891, 3585747464) ip_to_long(srcIp) ## [1] 2461153891 2919607448 2461153891 3585747464 2461153891 2861209742 ## [7] 2861209742 2919607448 2461153891 3585747464 long_to_ip(srcInt) ## [1] "146.178.58.99" "174.5.172.152" "146.178.58.99" ## [4] "213.186.42.8" "146.178.58.99" "170.138.152.142" ## [7] "170.138.152.142" "174.5.172.152" "146.178.58.99" ## [10] "213.186.42.8"
Everything works as expected and we can now use those conversion routines without resorting to *apply
calls.
Exercise For the reader!
To see what
Vectorize()
does under the covers, just enterip_to_long
orlong_to_ip
at an R console prompt without the parenthesis. This will show the source of the functions thatVectorize()
built. Try to build your own vectorized versions by trimming down what’s in the generated source code.
We’ll see how to perform the same vectorization task at the Rcpp
level in the next post and put each version in a head-to-head benchmark test. NOTE: Using Rcpp
with R markdown takes some extra steps, and I’ve posted a gist that shows some of the options you need to set to ensure the Rcpp
code compiles and links properly and also the wicked-cool way you can embed Rcpp
code right in markdown documents.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.