Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Encouraged by this post I had another look at quandl for collecting datasets from different agencies. Right now I need to get data for four countries on a couple of dozen indicators.
This graphic is just a quick example with only two indicators of what I am aiming to be able to do.
The process on Quandl at the moment is a bit fiddly:
- there is no search function in the API
- the country codes used are different from agency to agency
So my workflow is this. It isn’t as complicated as it sounds. I have used spreadsheets to store country codes and queries to make it all as re-useable as possible. You can download the spreadsheets here and here.
- edit the csv spreadsheet of the 2-and 3-digit ISO country codes, plus the actual names. Also, WHO for some reasons uses some other codes which I had to paste in by hand. If you find your sources are also using yet other codes, you can add them to the spreadsheet. Put an x in the “enabled” column to mark the countries you want to use.
- search manually at quandl for interesting queries and add them to the other csv spreadsheet, replacing the country code with %s, again putting an x in the “enabled” column for the queries you want, adding a human-readable title in the “title” column if you want and putting “alpha2″ or “alpha3″ etc in the country_sign column to mark which kind of country code is being used.
- run the script below.
authcode=”yourAuthCodeFromQuandl”
library(Quandl)
cou=list()
queries=read.csv(“queries.csv”)
queries=queries[queries$enabled!=””,]
codes=read.csv(“countryCodes.csv”)
codesE=codes[codes$enabled!=””,]
for(qq in 1:nrow(queries)){
q=queries$query[qq]
for(cc in 1:nrow(codesE)){
co=codesE[cc,queries[qq,”country_sign”]]
tex=paste(q,co,sep=”.”)
cou[[tex]]=try(Quandl(sprintf(q,co),authcode=authcode),T)
if(attributes(cou[[tex]])$class!=”try-error”)cou[[tex]]$Indicator=ifelse(!is.na(queries$title[qq]),queries$title[qq],q)
if(attributes(cou[[tex]])$class!=”try-error”)cou[[tex]]$Country=codesE[cc,”name”]
}}
rr=rbind.fill(cou[sapply(cou,function(x)length(x)>1)])
rr$Date=as.character(rr$Date)rr$Year=as.character(rr$Year)
rr$Year=as.Date(ifelse(!is.na(rr$Year),rr$Year,rr$Date))
rr$Value=ifelse(!is.na(rr$Value),rr$Value,rr$Percent) #you might have to do something like this if your queries are returning data in columns with some other label than Value
#then try a graphic for demonstration purposes
ggplot(data=rr,aes(x=Year,y=Value,group=Country,colour=Country))+geom_point(size=3)+geom_line()+facet_grid(Indicator~.,scales=”free”)+ theme(strip.text.y = theme_text(size = 13, hjust=0,angle = 0))+theme(axis.text.x=element_text(angle=90))
And voila.
I wanted to put the spreadsheets as a google spreadsheet but it seems RGoogleDocs is not working for R 3.0.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.