Making World Tile Grid-Grids
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A colleague asked if I would blog about how I crafted the grid of world tile grids in this post and I accepted the challenge. The technique isn’t too hard as it just builds on the initial work by Jon Schwabish and a handy file made by Maarten Lambrechts.
The Premise
For this particular use-case, I sifted through our internet scan data and classified a series of device families from their telnet banners then paired that with our country-level attribution data for each IPv4 address. I’m not generally “a fan” of rolling things up at a country level, but since many (most) of these devices are residential or small/medium-business routers, country-level attribution has some merit.
But, I’m also not a fan of country-level choropleths when it comes to “cyber” nor am I wont to area-skewed cartograms since most folks still cannot interpret them. Both of those take up a ton of screen real estate, too, espeically if you have more than one of them. Yet, I wanted to show a map-like structure without resorting to Hilbert IPv4 heatmaps since they are neither very readable by a general audience and become skewed when you have to move up from a 1 pixel == 1 Class C network block.
I think the tile grid is a great compromise since it avoids the “area”and projection skewness confusion that regular global choropleths cause while still preserving geographic & positional proximity. Sure, they’ll take some getting used to by casual readers, but I felt it was the best of all the tradeoffs.
The Setup
Here’s the data:
library(here) library(hrbrthemes) library(tidyverse) wtg <- read_csv("https://gist.githubusercontent.com/maartenzam/787498bbc07ae06b637447dbd430ea0a/raw/9a9dafafb44d8990f85243a9c7ca349acd3a0d07/worldtilegrid.csv") glimpse(wtg) ## Observations: 192 ## Variables: 11 ## $ name <chr> "Afghanistan", "Albania", "Algeria", "Angola",... ## $ alpha.2 <chr> "AF", "AL", "DZ", "AO", "AQ", "AG", "AR", "AM"... ## $ alpha.3 <chr> "AFG", "ALB", "DZA", "AGO", "ATA", "ATG", "ARG... ## $ country.code <chr> "004", "008", "012", "024", "010", "028", "032... ## $ iso_3166.2 <chr> "ISO 3166-2:AF", "ISO 3166-2:AL", "ISO 3166-2:... ## $ region <chr> "Asia", "Europe", "Africa", "Africa", "Antarct... ## $ sub.region <chr> "Southern Asia", "Southern Europe", "Northern ... ## $ region.code <chr> "142", "150", "002", "002", NA, "019", "019", ... ## $ sub.region.code <chr> "034", "039", "015", "017", NA, "029", "005", ... ## $ x <int> 22, 15, 13, 13, 15, 7, 6, 20, 24, 15, 21, 4, 2... ## $ y <int> 8, 9, 11, 17, 23, 4, 14, 6, 19, 6, 7, 2, 9, 8,... routers <- read_csv(here::here("data", "routers.csv")) routers ## # A tibble: 453,027 x 3 ## type country_name country_code ## <chr> <chr> <chr> ## 1 mikrotik Slovak Republic SK ## 2 mikrotik Czechia CZ ## 3 mikrotik Colombia CO ## 4 mikrotik Bosnia and Herzegovina BA ## 5 mikrotik Czechia CZ ## 6 mikrotik Brazil BR ## 7 mikrotik Vietnam VN ## 8 mikrotik Brazil BR ## 9 mikrotik India IN ## 10 mikrotik Brazil BR ## # ... with 453,017 more rows distinct(routers, type) %>% arrange(type) %>% print(n=11) ## # A tibble: 11 x 1 ## type ## <chr> ## 1 asus ## 2 dlink ## 3 huawei ## 4 linksys ## 5 mikrotik ## 6 netgear ## 7 qnap ## 8 tplink ## 9 ubiquiti ## 10 upvel ## 11 zte
So, we have 11 different device families under assault by “VPNFilter”
and I wanted to show the global distribution of them. Knowing the
compact world tile grid would facet well, I set off to make it happen.
Let’s get some decent names for facet labels:
real_names <- read_csv(here::here("data", "real_names.csv")) real_names ## # A tibble: 11 x 2 ## type lab ## <chr> <chr> ## 1 asus Asus Device ## 2 dlink D-Link Devices ## 3 huawei Huawei Devices ## 4 linksys Linksys Devices ## 5 mikrotik Mikrotik Devices ## 6 netgear Netgear Devices ## 7 qnap QNAP Devices ## 8 tplink TP-Link Devices ## 9 ubiquiti Ubiquiti Devices ## 10 upvel Upvel Devices ## 11 zte ZTE Devices
Next, we need to summarise our scan results and pair it up the world
tile grid data and our real names:
count(routers, country_code, type) %>% # summarise the data into # of device familes per country left_join(wtg, by = c("country_code" = "alpha.2")) %>% # join them up on the common field filter(!is.na(alpha.3)) %>% # we only want countries on the grid and maxmind attributes some things to meta-regions and anonymous proxies left_join(real_names) -> wtg_routers glimpse(wtg_routers) ## Observations: 629 ## Variables: 14 ## $ country_code <chr> "AE", "AE", "AE", "AF", "AF", "AF", "AG", "AL"... ## $ type <chr> "asus", "huawei", "mikrotik", "huawei", "mikro... ## $ n <int> 1, 12, 70, 12, 264, 27, 1, 941, 2081, 7, 2, 1,... ## $ name <chr> "United Arab Emirates", "United Arab Emirates"... ## $ alpha.3 <chr> "ARE", "ARE", "ARE", "AFG", "AFG", "AFG", "ATG... ## $ country.code <chr> "784", "784", "784", "004", "004", "004", "028... ## $ iso_3166.2 <chr> "ISO 3166-2:AE", "ISO 3166-2:AE", "ISO 3166-2:... ## $ region <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia"... ## $ sub.region <chr> "Western Asia", "Western Asia", "Western Asia"... ## $ region.code <chr> "142", "142", "142", "142", "142", "142", "019... ## $ sub.region.code <chr> "145", "145", "145", "034", "034", "034", "029... ## $ x <int> 20, 20, 20, 22, 22, 22, 7, 15, 15, 15, 20, 20,... ## $ y <int> 10, 10, 10, 8, 8, 8, 4, 9, 9, 9, 6, 6, 6, 6, 1... ## $ lab <chr> "Asus Device", "Huawei Devices", "Mikrotik Dev...
Then, plot it:
ggplot(wtg_routers, aes(x, y, fill=n, group=lab)) + geom_tile(color="#b2b2b2", size=0.125) + scale_y_reverse() + viridis::scale_fill_viridis(name="# Devices", trans="log10", na.value="white", label=scales::comma) + facet_wrap(~lab, ncol=3) + coord_equal() + labs( x=NULL, y=NULL, title = "World Tile Grid Per-country Concentration of\nSeriously Poorly Configured Network Devices", subtitle = "Device discovery based on in-scope 'VPNFilter' vendor device banner strings", caption = "Source: Rapid7 Project Sonar & Censys" ) + theme_ipsum_rc(grid="") + theme(panel.background = element_rect(fill="#969696", color="#969696")) + theme(axis.text=element_blank()) + theme(legend.direction="horizontal") + theme(legend.key.width = unit(2, "lines")) + theme(legend.position=c(0.85, 0.1))
Doh! We forgot to ensure we had data for every country. Let’s try that
again:
count(routers, country_code, type) %>% complete(country_code, type) %>% filter(!is.na(country_code)) %>% left_join(wtg, c("country_code" = "alpha.2")) %>% filter(!is.na(alpha.3)) %>% left_join(real_names) %>% complete(country_code, type, x=unique(wtg$x), y=unique(wtg$y)) %>% filter(!is.na(lab)) %>% ggplot(aes(x, y, fill=n, group=lab)) + geom_tile(color="#b2b2b2", size=0.125) + scale_y_reverse() + viridis::scale_fill_viridis(name="# Devices", trans="log10", na.value="white", label=scales::comma) + facet_wrap(~lab, ncol=3) + coord_equal() + labs( x=NULL, y=NULL, title = "World Tile Grid Per-country Concentration of\nSeriously Poorly Configured Network Devices", subtitle = "Device discovery based on in-scope 'VPNFilter' vendor device banner strings", caption = "Source: Rapid7 Project Sonar & Censys" ) + theme_ipsum_rc(grid="") + theme(panel.background = element_rect(fill="#969696", color="#969696")) + theme(axis.text=element_blank()) + theme(legend.direction="horizontal") + theme(legend.key.width = unit(2, "lines")) + theme(legend.position=c(0.85, 0.1))
That’s better.
We take advantage of ggplot2’s ability to facet and just ensure we have
complete (even if NA
) tiles for each panel.
Once consumers start seeing these used more they’ll be able to pick up
key markers (or one of us will come up with a notation that makes key
markers more visible) and be able to get specific information from the
chart. I just wanted to show regional and global differences between
vendors (and really give MikroTik users a swift kick in the patootie for
being so bad with their kit).
FIN
You can find the RStudio project (code + data) here: (http://rud.is/dl/tile-grid-grid.zip)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.