Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I was reviewing RSS feeds when I came across this story about “ObamaCare Employer Mandate: A List Of Cuts To Work Hours, Jobs” over on Investors.com. Efficacy of the law notwithstanding, I thought it might be interesting to visualize the data since the folks over at Investors.com provided a handy spreadsheet that they seem to maintain pretty well (link is in the article).
The spreadsheet is organized by date and lists each state where the jobs were impacted along with the employer, employer type (public/private), reason and number of jobs impacted (if available). They also have links to news stories related to each entry.
My first thought was to compare impact across states by date, so I threw together a quick R script to build a faceted bar chart:
library(ggplot2) library(plyr) # Source for job impact data: # http://news.investors.com/politics-obamacare/092513-669013-obamacare-employer-mandate-a-list-of-cuts-to-work-hours-jobs.htm emp.f <- read.csv("~/employers.csv", stringsAsFactors=FALSE) colnames(emp.f) <- c("State","Employer","Type","Action","Jobs.Cut","Action.Date") emp.f[is.na(emp.f$Jobs.Cut),]$Jobs.Cut = median(emp.f$Jobs.Cut, na.rm=TRUE) emp.f[emp.f$State=="Virgina", ]$State = "Virginia" emp.f[emp.f$State=="Washington DC", ]$State = "District of Columbia" |
Yes, they really spelled “Virginia” wrong, at least in the article text where I initially scraped the data from before I saw there was a spreadsheet available. Along with fixing “Virginia”, I also changed the name of “Washington DC” to “District of Columbia” for reasons you’ll see later on in this post. I’m finding it very helpful to do as much of the data cleanup in-code (R or Python) whenever possible since it makes the process far more repeatable than performing the same tasks by hand in a text editor and is essential if you know the data is going to change/expand.
After reading in the data, it was trivial to get a ggplot
of the job impacts by state (click image for larger version):
p <- ggplot(emp.f, aes(x=Action.Date, y=Jobs.Cut)) p <- p + geom_bar(aes(fill=State), stat="identity") p <- p + facet_wrap(~State) p <- p + theme_bw() p <- p + theme(legend.position=0, axis.text.x = element_text(angle = 90)) p <- p + labs(x="Action Date", y="# Jobs Cut") p |
That visualization provided some details, but I decided to expand the scope a bit and wanted to make an interactive “bubble chart” (since folks seem to love bubbles) with circle size relative to the total job cuts per state and circle color reflecting the conservative/liberal leaning of each state (i.e. ‘red’ vs ‘blue’) to see if there was any visual correlation by that attribute. I found the political data over at Gallup and went to work prepping the data with some additional R code. (NOTE: The Gallup data was the reason for the “DC” name change since Gallup uses “District of Columbia” in their data set.)
# aggregate state data emp.state.sum.df <- count(emp.f,c("State"),c("Jobs.Cut")) colnames(emp.state.sum.df) <- c("State","Total.Jobs.Cut") # get total (estimated) jobs impacted total.jobs <- sum(emp.state.sum.df$Total.Jobs.Cut) # Source for the red v blue state data: # http://www.gallup.com/poll/125066/State-States.aspx # read political leanings red.blue.df <- read.csv("~/red-blue.csv", stringsAsFactors=FALSE) # join the jobs and leaning data together s <- join(emp.state.sum.df, red.blue.df, by="State") # cheat and get leaning range for manual input into the datavis leaning.range <- range(s$Conservative.Advantage) # build the JSON data file. store state summary data for the bubbles, but also include # the detail level for extra data for the viz # need to clean up this file post-write and definitely run it through http://jsonlint.com/ jsfile = file("states.tmp","w") by(s, 1:nrow(s), function(row) { writeLines(sprintf(' {"name": "%s", "size":%d, "leaning":%2.1f, "detail":[',row$State,row$Total.Jobs.Cut,row$Conservative.Advantage),jsfile) employers = emp.f[emp.f$State == row$State,] by(employers, 1:nrow(employers), function(emp.row) { writeLines(sprintf(' { "employer":"%s", "emptype":"%s", "actiondetail":"%s", "jobsimpacted":%d, "when":"%s"},', emp.row$Employer, emp.row$Type, gsub('"',"'",emp.row$Action), emp.row$Jobs.Cut, emp.row$Action.Date),jsfile) }) writeLines("]},\n",jsfile) }) close(jsfile) |
I know the comments point out the need to tweak the resulting JSON a bit (mostly to remove “errant” commas, which is one of the annoying bits about JSON), but I wanted to re-emphasize the huge utility of JSONlint as it can save you a great deal of time debugging large amounts of gnarly JSON data.
With the data prepped, I threw together a D3 visualization that shows the bubbles on the left and details by date and employer on the right.
Since it’s D3, there’s no need to put the source code in the blog post. Just do a “view-source
” on the resulting visualization or poke around the github repository. I will, however, point out a couple useful/interesting bits from the code.
First, coloring circles by political leaning took exactly one line of code since D3 provides a means to map a range of values to colors:
var ramp = d3.scale.linear().domain([-21,36]).range(["#253494","#B30000"]); |
I chose the colors with Color Brewer but cheated (as I indicated in the R code) by pre-computing the range of the values for the palette. You can see the tiny District of Columbia’s very blue circle in the lower-left of the field of circles. Hopefully Investors.com will maintain the data set and we can look at changes over a larger period of time.
Second, you get rudimentary “popups” for free via element “title
” tags on the SVG circles, so no need for custom tooltip code:
node.append("title") .text(function(d) { return d.stateName + ": " + format(d.value) + " jobs impacted"; }); |
I could have tweaked the display a bit more, added links to the stories and provided a means to sort the “# Jobs” column by count or date, but I took enough time away from the book to scratch this visualization itch and it came out pretty much the way I wanted it to.
If you do hack at it and build something better (which should not be terribly difficult), drop a note in the comments or over at github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.