Mapping IPv4 Address (with Hilbert curves) in R

[This article was first published on Data Driven Security, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While there’s an unholy affinity in the infosec commuinty with slapping IPv4 addresses onto a world map, that isn’t the only way to spatially visualize IP addresses. A better approach (when tabluation with bar charts, tables or other standard visualization techniques won’t do) is to map IPv4 addresses into Hilbert space-filling curve. You can get a good feel for how these work over at The Measurement Factory, which is where this image comes from:

mfhil

This paper [PDF] also is a good primer.

While TMF’s ipv4heatmap command-line software can crank out those visualizations really well, I wanted a way to generate them in R as we explore internet IP space at work. So, I adapted bits of their code to work in a ggplot context and took a stab at an ipv4heatmap package.

The functionality is currently pretty basic. Give ipv4heatmap a vector of IP addresses and you’ll get a heatmap of them. Feed in a CIDR block to boundingBoxFromCIDR and you’ll get a structure suitable for displaying with geom_rect. To get an idea of how it works, here’s a small example.

The following snippet of code reads in a cached copy of an IPv4 block list from blocklist.de and turns the IP addresses into a heatmap (which is mostly one color since there aren’t many blocks per class C). It then grabs the CIDR blocks for China and North Korea since, well, #CHINADPRKHACKSALLTHETHINGS according to “leading” IR firms and the US gov. It then overlays a alpha filled rectangle over the map to see just how many points fall within those CIDRs.

devtools::install_github("vz-risk/ipv4heatmap")
library(ipv4heatmap)
library(data.table)

# read in cached copy of blocklist.de IPs - orig URL http://www.blocklist.de/en/export.html
hm <- ipv4heatmap(readLines("http://dds.ec/data/all.txt"))

# read in CIDRs for China and North Korea
cn <- read.table("http://www.iwik.org/ipcountry/CN.cidr", skip=1)
kp <- read.table("http://www.iwik.org/ipcountry/KP.cidr", skip=1)

# make bounding boxes for the CIDRs

cn_boxes <- rbindlist(lapply(boundingBoxFromCIDR(cn$V1), data.frame))
kp_box <- data.frame(boundingBoxFromCIDR(kp$V1))

# overlay the bounding boxes for China onto the IPv4 addresses we read in and Hilbertized

gg <- hm$gg
gg <- gg + geom_rect(data=cn_boxes, 
                     aes(xmin=xmin, ymin=ymin, xmax=xmax, ymax=ymax), 
                     fill="white", alpha=0.2)
gg <- gg + geom_rect(data=kp_box, 
                     aes(xmin=xmin, ymin=ymin, xmax=xmax, ymax=ymax), 
                     fill="white", alpha=0.2)

gg

You’ll want to download that and open it up in a decent image program. The whole image is 4096x4096, so you can zoom in pretty well to see where evil hides itself.

If you find a cool use for ipv4heatmap definitely drop a note in the comments or on github. One thing we’ve noticed is that wrapping a series of individual images up in animation to see changes over time can be really interesting/illuminating.

One caveat: it uses the Boost libraries, so Windows R folk may need to jump through some hoops to get it going.

Countries Of The Internet

Since I was playing around with IPv4 heatmaps, I thought it might be neat to show how country IP address allocations fit on the “map”. So, I took the top 12 countries (by # of IPv4 addresses assigned), used ipv4heatmap to color in their bounding boxes and then whipped up some javascript to let you see/explore the fragmented allocation landscape we live in.

There’s also a non-framed version of that available. The 2D canvas scaling may be off in some browsers, but not by much. Shift-click once in the image to compensate if it’s cut off at all.

The amount of “micro-allocation” (my term) really surprised me. While I “knew” it was this way, seeing it gives you a whole new perspective.

The more I’ve worked with routing, IP & DNS data over the years, the more I’m amazed that anything on the internet works at all.

To leave a comment for the author, please follow the link and comment on their blog: Data Driven Security.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)