Site icon R-bloggers

My Goodness. What a Fat Dataset!

[This article was first published on Data and Analysis with R, at Work, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently at work we got sent a data file containing information on donations to a specific charitable organization, ranging all the way back to the 80′s.  Usually, when we receive a dataset with a donation history in it, each row represents a specific gift from a specific person at a specific time.  Also, each column represents some kind of information about that gift.  The result is usually a dataset which is fairly long (thousands or hundreds of thousands, in my recent experience) with maybe about 15 columns or more.

In this case, each row represented one person, but there were 1,551 columns!!  As it turned out, after the first column, which was the ID of the person donating the money, there were supposed to be just 31 extra columns to describe the gift in each row.  However, the person who put the data together decided that we should get 31*50 columns so that each row represented a person, and not a gift, and every subsequent gift from that person was represented by an extra 31 columns to the right of the previous 31.  Ridiculous!!

Anyway, I knew that I could reshape this using R, by stacking all 50 copies of the same variable together, and making sure that each new resultant 31 vectors should just take the names of the first 31 vectors.  Following is a gist that shows what eventually worked for me:

In conclusion, if you need your dataset to get in shape, you need only remember one letter: R!


To leave a comment for the author, please follow the link and comment on their blog: Data and Analysis with R, at Work.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.