Power Outage Impact Choropleths In 5 Steps in R (featuring rvest & RStudio “Projects”)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I and @awpiii were trading news about the power outages in Maine & New Hampshire last night and he tweeted the link to the @PSNH Outage Map. As if the Bing Maps tiles weren’t bad enough, the use of a categorical color scale instead of a sequential one[1] caused sufficient angst that I whipped up an alternate version in R between making pies and bread for Thanksgiving (even with power being out for us).
PSNH provides a text version of outages (by town) that ends up being a pretty clean HTML table, and a quick Google search led me to a fairly efficient town-level shapefile for New Hampshire. With these data files at the ready, it was time to make a better map.
Step 0 – Environment Setup
So, I lied. There are six steps. “5” just works way better in attention-grabbing list headlines. The first one is setting up the project and loading all the libraries we’ll need. I use RStudio for most of my R coding and their IDE has the concept of a “project” which has it’s own working directory, workspace, history, and source documents separate from any other RStudio windows you have open. They are a great way to organize your analyses and experiments. I have my own “new project” script that sets up additional directory structures, configures the Rproj
file with my preferences and initializes a git repository for the project.
I also use the setup step to load up a ggplot2 map theme I keep in a gist.
library(sp) library(rgdal) library(dplyr) library(rvest) library(stringi) library(scales) library(RColorBrewer) library(ggplot2) # for theme_map devtools::source_gist("https://gist.github.com/hrbrmstr/33baa3a79c5cfef0f6df") |
Step 1 – Read in the map
This is literally a one-liner:
nh <- readOGR("data/nhtowns/NHTOWNS_POLY.shp", "NHTOWNS_POLY") |
My projects all have a data
directory and thats where I normally store shapefiles. I used ogrinfo -al NHTOWNS_POLY.shp
at the command line to determine the layer name.
Step 2 – Read in the outage data
The rvest
package is nothing short of amazing. It makes very quick work of web scraping and—despite some newlines in the mix—this qualifies as a one-liner in my book:
outage <- html("http://www.psnh.com/outagelist/") %>% html_nodes("table") %>% html_table() %>% .[[1]] |
That bit of code grabs the whole page, extracts all the HTML tables (there is just one in this example), turns it into a list of data frames and then returns the first one.
Step 3 – Data wrangling
While this step is definitely not as succinct as the two previous ones, it’s pretty straightforward:
outage <- outage[complete.cases(outage),] colnames(outage) <- c("id", "total_customers", "without_power", "percentage_out") outage$id <- stri_trans_totitle(outage$id) outage$out <- cut(outage$without_power, breaks=c(0, 25, 100, 500, 1000, 5000, 10000, 20000, 40000), labels=c("1 - 25", "26 - 100", "101 - 500", "501 - 1,000", "1,001 - 5,000", "5,001- 10,000", "10,001 - 20,000", "20,001 - 40,000")) |
We filter out the NA
‘s (this expunges the “total” row), rename the columns, convert the town name to the same case used in the shapefile (NOTE: I could have just toupper
ed all the town names, but I really like this function from the stringi
package) and then use cut
to make an 8-level factor out of the customer outage count (to match the PSNH map legend).
Step 4 – Preparing the map for plotting with ggplot
This is another one-liner:
nh_map <- fortify(nh, region="NAME") |
and makes it possible to use the town names when specifying the polygon regions we want to fill with our spiffy color scheme.
Step 5 – Plotting the map
It is totally possible to do this in one line, but many kittens will lose their lives if you do. I like this way of structuring the creation of a ggplot
graphic since it makes it very easy to comment out or add various layers or customizations without worrying about stray +
signs.
gg <- ggplot(data=nh_map, aes(map_id=id)) gg <- gg + geom_map(map=nh_map, aes(x=long, y=lat), color="#0e0e0e", fill="white", size=0.2) gg <- gg + geom_map(data=outage, map=nh_map, aes(fill=out), color="#0e0e0e", size=0.2) gg <- gg + scale_fill_brewer(type="seq", palette="RdPu", name="Number ofncustomer outagesnin each town") gg <- gg + coord_equal() gg <- gg + labs(title=sprintf("%s Total PSNH Customers Without Power", comma(sum(outage$without_power)))) gg <- gg + theme_map() gg <- gg + theme(legend.position="right") gg |
That sequence starts the base ggplot
object creation, sets up the base map colors and then overlays the town outage colors. We use the RdPu
Color Brewer sequential palette and give the legend the same title as the PSNH counterpart.
The shapefile is already projected (Lambert Conformal Conic—take a look at it with ogrinfo -al
), so we can get away with using coord_equal
vs re-projecting it, and we do a tally of outages to stick in the title. My base theme_map
is designed for Maine, hence the extra theme
call to move the legend.
The Finished Product
Crisp SVG polygons, no cluttered Bing Maps tiles and a proper color palette.
All the code is up on github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.