Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A long time ago, in a github repo far, far away there lived a tiny package that made it possible to create equal area, square U.S. state cartograms in R dubbed statebins
Previously, on statebins
…
There were three different functions in the old-style package:
- one for discrete scales (it automated ‘cuts’)
- one for continuous scales
- one for manual scales
It also did some hack-y stuff with grobs
to try to get things to look good without putting too much burden on the user.
All that “mostly” worked, but I always ended up doing some painful workaround when I needed more custom charts (not that I have to use this package much given the line of work I’m in).
Downsizing statebins
Now, there’s just one function for making the cartograms — statebins()
— and another for applying a base theme — theme_statebins()
. The minimalisation has some advantages that we’ll take a look at now, starting with the most basic example (the one on the manual page):
data(USArrests) USArrests$state <- rownames(USArrests) statebins(USArrests, value_col="Assault", name = "Assault") + theme_statebins(legend_position="right")
Two things should stand out there:
- you got
scale_fill_distiller()
for free! - labels are dark/light depending on the tile color
Before we go into ^^, it may be helpful to show the new function interface:
statebins(state_data, state_col = "state", value_col = "value", dark_label = "black", light_label = "white", _size = 3, state_border_col = "white", state_border_size = 2, ggplot2_scale_function = ggplot2::scale_fill_distiller, ...)
You pass in the state name/abbreviation & value columns like the old interface but also specify colors for the dark & light labels (set hex code color with 00
ending alpha values if you don’t want labels but Muricans are pretty daft and generally need the abbreviations on the squares). You can set the size, too (we’ll do that in a bit) and customize the border color (usually to match the background of the target medium). BUT, you also pass in the ggplot2 scale function you want to use and the named parameters for it (that’s what the ...
is for).
So, yes I’ve placed more of a burden on you if you want discrete cuts, but I’ve also made the package way more flexible and made it possible to keep the labels readable without you having to lift an extra coding finger.
The theme()
-ing is also moved out to a separate theme function which makes it easier for you to further customize the final output.
But that’s not all!
There are now squares for Puerto Rico, the Virgin Islands and New York City (the latter two were primarily for new features/data in cdcfluview
but they are good to have available). Let’s build out a larger example with some of these customizations (we’ll make up some data to do that):
library(statebins) library(tidyverse) library(viridis) data(USArrests) # make up some data for the example rownames_to_column(USArrests, "state") %>% bind_rows( data_frame( state = c("Virgin Islands", "Puerto Rico", "New York City"), Murder = rep(mean(max(USArrests$Murder),3)), Assault = rep(mean(max(USArrests$Assault),3)), Rape = rep(mean(max(USArrests$Rape),3)), UrbanPop = c(93, 95, 100) ) ) -> us_arrests statebins(us_arrests, value_col="Assault", ggplot2_scale_function = viridis::scale_fill_viridis) + labs(title="USArrests + made up data") + theme_statebins("right")
Cutting to the chase
I still think it makes more sense to use binned data in these cartograms, and while you no longer get that for “free”, it’s not difficult to do:
adat <- suppressMessages(read_csv("http://www.washingtonpost.com/wp-srv/special/business/states-most-threatened-by-trade/states.csv?cache=1")) mutate( adat, share = cut(avgshare94_00, breaks = 4, labels = c("0-1", "1-2", "2-3", "3-4")) ) %>% statebins( value_col = "share", ggplot2_scale_function = scale_fill_brewer, name = "Share of workforce with jobs lost or threatened by trade" ) + labs(title = "1994-2000") + theme_statebins()
More manual labor
You can also still use hardcoded colors, but it’s a little more work on your end (but not much!):
election_2012 <- suppressMessages(read_csv("https://raw.githubusercontent.com/hrbrmstr/statebins/master/tmp/election2012.csv")) mutate(election_2012, value = ifelse(is.na(Obama), "Romney", "Obama")) %>% statebins( _size=4, dark_label = "white", light_label = "white", ggplot2_scale_function = scale_fill_manual, name = "Winner", values = c(Romney = "#2166ac", Obama = "#b2182b") ) + theme_statebins()
BREAKING NEWS: Rounded corners
A Twitter request ended up turning into a new feature this afternoon (after I made this post) => rounded corners:
data(USArrests) USArrests$state <- rownames(USArrests) statebins(USArrests, value_col="Assault", name = "Assault", round=TRUE) + theme_statebins(legend_position="right")
FIN
It’ll be a while before this hits CRAN and I’m not really planning on keeping the old interface when the submission happens. So, it’ll be on GitHub for a bit to let folks chime in on what additional features you want and whether you really need to keep the deprecated functions around in the package.
So, kick the tyres and don’t hesitate to shoot over some feedback!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.