Site icon R-bloggers

Showing a distribution over time: how many summary stats?

[This article was first published on Robert Grant's stats blog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I saw this nice graph today on Twitter, by Thomas Forth:

but the more I looked at it, the more I felt it was hard to understand the changes over time across the income distribution from the Gini coefficient and the median. People started asking online for other percentiles, so I thought I would smooth each of them from the source data and plot them side by side:

Now, this has the advantage of showing exactly where in society the growth or contraction is, but it loses the engaging element of the wandering nation across economic space (cf Booze Space; where do we end up? washed up on the banks of the Walbrook?), which should not be sneezed at. Something engaging matters in dataviz.

Code (as you know, I’m a nuts ‘n’ bolts guy, so don’t go recommending ggplot2 to me):

library(foreign) library(splines) bluecol<-"#467db4" redcol<-"#b44f46" uk<-read.csv("uk_income.csv")[1:53,1:22] uk$Year <- as.numeric(substr(uk$Year,1,4)) sm<-apply(uk,2,function(z){smooth.spline(x=uk$Year,y=z)$y}) png("uk_income.png") par(yaxs="i") plot(uk$Year[1:3],sm[1:3,4],type="l", ylim=c(min(sm[,4:22]-1),max(sm[,4:22]+60)), xlim=c(1960,2015), col=bluecol, main="Percentiles of UK income over time", sub="(Colour indicates governing political party)", ylab="2013 GBP", xlab="Year") lines(uk$Year[4:10],sm[4:10,4],col=redcol) # Wilson I lines(uk$Year[11:14],sm[11:14,4],col=bluecol) # Heath lines(uk$Year[15:19],sm[15:19,4],col=redcol) # Wilson II, Callaghan lines(uk$Year[20:37],sm[20:37,4],col=bluecol) # Thatcher, Major lines(uk$Year[38:50],sm[38:50,4],col=redcol) # Blair, Brown lines(uk$Year[51:53],sm[51:53,4],col=bluecol) # cameron for(i in 5:22) { lines(uk$Year[1:3],sm[1:3,i],col=bluecol) # Macmillan, Douglas-Home lines(uk$Year[4:10],sm[4:10,i],col=redcol) # Wilson I lines(uk$Year[11:14],sm[11:14,i],col=bluecol) # Heath lines(uk$Year[15:19],sm[15:19,i],col=redcol) # Wilson II, Callaghan lines(uk$Year[20:37],sm[20:37,i],col=bluecol) # Thatcher, Major lines(uk$Year[38:50],sm[38:50,i],col=redcol) # Blair, Brown lines(uk$Year[51:53],sm[51:53,i],col=bluecol) # Cam'ron } dev.off()

(uk_income.csv is just the trimmed down source data spreadsheet)


To leave a comment for the author, please follow the link and comment on their blog: Robert Grant's stats blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.