R script to manipulate health data
[This article was first published on John Marquess » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is the code that fixed up the World Bank data export for use in Tableau. The databank spits out everything in an untidy format for grouping and aggregating. The reshape2 and plyr packages make it easy to manipulate the whole set in a couple of seconds.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(reshape2) | |
library(plyr) | |
options(stringsAsFactors=FALSE) | |
data <- read.csv('data.csv', header=TRUE) | |
# fix up the data output from world bank | |
names(data) <- c('Indicator', 'Country', '2002','2003','2004','2005','2006','2007', '2008','2009','2010','2011','2000','2001') | |
### FUNCTIONS | |
# a generic funciton to remove the first columns in each dataframe in a list | |
dropFirst = function(x){ | |
x[,1] <- NULL | |
return(x) | |
} | |
# Melts the data frame in each list | |
meltItem <- function(x){ | |
melt(data=x, na.rm=TRUE) | |
} | |
# Remove rows with no data for that year | |
indicatorsNames <- unique(data$Indicator) | |
indicatorsNames <- as.data.frame(indicatorsNames) | |
indicatorsList <- dlply(.data=data, .variables=1, .fun=dropFirst, .progress='text') | |
# melt each dataframe in ths list | |
indicatorsList.m <- llply(.data=indicatorsList, .fun=meltItem, .progress='text') | |
indicatorsList.m <- ldply(indicatorsList.m) | |
# Give the colums sensibe names | |
names(indicatorsList.m) <- c('Indicator', 'Country', 'Year', 'Value') | |
# Write dataframe to file for future use | |
write.table(indicatorsList.m, 'tmp.txt', append=FALSE, quote=FALSE, sep='\t', row.names=FALSE) |
To leave a comment for the author, please follow the link and comment on their blog: John Marquess » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.