Generating d3js Motion Charts from rCharts
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Remember Gapminder, the animated motion chart popularised by Hans Rosling in his TED Talks and Joy of Stats TV programme? Well it’s back on TV this week in Don’t Panic – The Truth About Population, a compelling piece of OU/BBC co-produced stats theatre featuring Hans Rosling, and a Pepper’s Ghost illusion brought into the digital age courtesy of the Musion projection system:
Whilst considering what materials we could use to support the programme, we started looking for ways to make use of the Gapminder visualisation tool that makes several appearances in the show. Unfortunately, neither Gapminder (requires Java?), nor the Google motion chart equivalent of it (requires Flash?), appear to work with a certain popular brand of tablet that is widely used as a second screen device…
Looking around the web, I noticed that that Mike Bostock had produced a version of the motion chart using d3.js: The Wealth & Health of Nations. Hmmm…
Playing with that rendering on a tablet, I had a few problems when trying to highlight individual countries – the interaction interfered with an invisible date slider control – but a quick shout out to my OU colleague Pete Mitton resulted in a tweaked version of the UI with the date control moved to the side. I also added a tweak to allow specified countries to be highlighted. You can find an example here (source).
Looking at how the data was pulled into the chart, it seems to be quite a convoluted form of JSON. After banging my head against a wall for a bit, a question on Stack Overflow about how to wrangle the data from something that looked like this:
Country Region Year V1 V2 AAAA XXXX 2001 12 13 BBBB YYYY 2001 14 15 AAAA XXXX 2002 36 56 AAAA XXXX 1999 45 67
to something that looked like this:
[ {"Country": "AAAA", "Region":"XXXX", "V1": [ [1999,45], [2001,12] , [2002,36] ], "V2":[ [1999,67], [2001,13] , [2002,56] ] }, {"Country": "BBBB", "Region":"YYYY", "V1":[ [2001,14] ], "V2":[ [2001,15] ] } ]
resulted in a handy function from Ramnath Vaidyanathan that fitted the bill.
One of the reasons that I wanted to use R for the data transformation step, rather than something like Python, was that I was keen to try to get a version of the motion charts working with the rCharts library. Such is the way of the world, Ramnath is the maintainer of rCharts, and with his encouragement I had a go at getting the motion chart to work with that library, heavily cribbing from @timelyportfolio’s rCharts Extra – d3 Horizon Conversion tutorial on getting things to work with rCharts along the way.
For what it’s worth, my version of the code is posted here: rCharts_motionchart.
I put together a couple of demo’s that seem to work, including the one shown below that pulls data from the World Bank indicators API and then chucks it onto a motion chart…
To start with, here are a couple of helper functions:
require('WDI') #A handy helper function for getting country data - this doesn't appear in the WDI package? #---- https://code.google.com/p/google-motion-charts-with-r/source/browse/trunk/demo/WorldBank.R?r=286 getWorldBankCountries <- function(){ require(RJSONIO) wbCountries <- fromJSON("http://api.worldbank.org/country?per_page=300&format=json") wbCountries <- data.frame(t(sapply(wbCountries[[2]], unlist))) wbCountries$longitude <- as.numeric(wbCountries$longitude) wbCountries$latitude <- as.numeric(wbCountries$latitude) levels(wbCountries$region.value) <- gsub("\\(all income levels\\)", "", levels(wbCountries$region.value)) return(wbCountries) } #----http://stackoverflow.com/a/19729235/454773 pluck_ = function (element){ function(x) x[[element]] } #' Zip two vectors zip_ <- function(..., names = F){ x = list(...) y = lapply(seq_along(x[[1]]), function(i) lapply(x, pluck_(i))) if (names) names(y) = seq_along(y) return(y) } #' Sort a vector based on elements at a given position sort_ <- function(v, i = 1){ v[sort(sapply(v, '[[', i), index.return = T)$ix] } library(plyr)
This next bit still needs some refactoring, and a bit of work to get it into a general form:
#I chose to have a go at putting all the motion chart parameters into a list params=list( start=1950, end=2010, x='Fertility', y='GDP', radius='Population', color='Region', key='Country', yscale='log', xscale='linear', rmin=0, xmin=0 ) ##This bit needs refactoring - grab some data; the year range is pulled from the motion chart config; ##It would probably make sense to pull countries and indicators etc into the params list too? ##That way, we can start to make this block a more general function? tmp=getWorldBankCountries()[,c('iso2Code','region.value')] names(tmp)=c('iso2Code','Region') data <- WDI(indicator=c('SP.DYN.TFRT.IN','SP.POP.TOTL','NY.GDP.PCAP.CD'),start = params$start, end = params$end,country=c("BD",'GB')) names(data)=c('iso2Code','Country','Year','Fertility','Population','GDP') data=merge(data,tmp,by='iso2Code') #Another bit of Ramnath's magic - http://stackoverflow.com/a/19729235/454773 dat2 <- dlply(data, .(Country, Region), function(d){ list( Country = d$Country[1], Region = d$Region[1], Fertility = sort_(zip_(d$Year, d$Fertility)), GDP = sort_(zip_(d$Year, d$GDP)), Population=sort_(zip_(d$Year, d$Population)) ) }) #cat(rjson::toJSON(setNames(dat2, NULL)))
To minimise the amount of motion chart configuration, can we start to set limits based on the data values?
#This really needs refactoring/simplifying/tidying/generalising #I'm not sure how good the range finding heuristics I'm using are, either?! paramsTidy=function(params){ if (!('ymin' %in% names(params))) params$ymin= signif(min(0.9*data[[params$y]]),3) if (!('ymax' %in% names(params))) params$ymax= signif(max(1.1*data[[params$y]]),3) if (!('xmin' %in% names(params))) params$xmin= signif(min(0.9*data[[params$x]]),3) if (!('xmax' %in% names(params))) params$xmax= signif(max(1.1*data[[params$x]]),3) if (!('rmin' %in% names(params))) params$rmin= signif(min(0.9*data[[params$radius]]),3) if (!('rmax' %in% names(params))) params$rmax= signif(max(1.1*data[[params$radius]]),3) params } params=paramsTidy(params)
This is the function that generates the rChart:
require(rCharts) #We can probably tidy the way that the parameters are mapped... #I wasn't sure whether to try to maintain the separation between params and rChart$params? rChart.generator=function(params, h=400,w=800){ rChart <- rCharts$new() rChart$setLib('../motionchart') rChart$setTemplate(script = "../motionchart/layouts/motionchart_Demo.html") rChart$set( countryHighlights='', yearMin= params$start, yearMax=params$end, x=params$x, y=params$y, radius=params$radius, color=params$color, key=params$key, ymin=params$ymin, ymax=params$ymax, xmin=params$xmin, xmax=params$xmax, rmin=params$rmin, rmax=params$rmax, xlabel=params$x, ylabel=params$y, yscale=params$yscale, xscale=params$xscale, width=w, height=h ) rChart$set( data= rjson::toJSON(setNames(dat2, NULL)) ) rChart } rChart.generator(params,w=1000,h=600)
Aside from tidying – and documenting/commenting – the code, the next thing on my to do list is to see whether I can bundle this up in a Shiny app. I made a start sketching a possible UI, but I’ve run out of time to do much more for a day or two… (I was also thinking of country checkboxes for either pulling in just that country data, or highlighting those countries.)
items=c("Fertility","GDP","Population") names(items)=items shinyUI(pageWithSidebar( headerPanel("Motion Chart demo"), sidebarPanel( selectInput(inputId = 'x', label = "X", choices = items, selected = 'Fertility'), selectInput(inputId = 'y', label = "Y", choices = items, selected = 'GDP'), selectInput(inputId = 'r', label = "Radius", choices = items, selected = 'Population') ), mainPanel( #The next line throws an error (a library is expected? But I don't want to use one?) showOutput("motionChart",'') ) ))
As ever, we’ve quite possibly run out of time on getting much up on the OpenLearn website by Thursday to support the programme as it airs, which is partly why I’m putting this code out now… If you manage to do anything with it that would allow folk to dynamically explore a range of development indicators over the next day or two (especially GDP, fertility, mortality, average income, income distributions (this would require different visualisations?)), we may be able to give it a plug from OpenLearn, and maybe via any tweetalong campaign that’s running as the programme airs…
If you do come up with anything, please let me know via the comments, or twitter (@psychemedia)…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.