Spring Cleaning Data: 5 of 6- 2 ifelse vs Merge
[This article was first published on OutLie..R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The blog in the data cleaning series looks at separating out the Federal Reserve Districts. What I wanted was two additional columns, where I had the name of the city and the number for each district. Since I was on a separation kick I thought it would be fun to do this using ifelse() function.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Well, what started out as a fun romp in the fields turned to an exercise in precision and frustration that did end well, but took too much time, and too many lines of code to do what I wanted.
While I was banging my head against the keyboard in frustration, the thought occurred to me. Instead of using the ifelse() function, create a table with the new columns of data then merge the original data with the table just created. Two lines of code for both columns of data, definitely one of those eureka moments.
The lesson in all of this, ifelse() functions are good within a limited use, I would say 5 or less. Unless you really like doing them, then have fun. If there are limited number of occurrences like this example 12 different districts, the table works very well. What took me 2 hours of work using the ifelse() function, took me 15 minutes using the table method. The code is simpler, and easier to understand. Sure, there is the extra table to be imported, but it is small and very manageable.
I have placed the code below, with the merge code first, followed by the ifelse() code. The table I used can be downloaded from here (District Data). Read the district data in by using the read.csv() then merge the two files using the ‘district’ as the column they both have in common. The ifelse(logic, true, false), the logic is if the column looks like one of the districts, if true a 1/Boston, at the end there is the ‘Error’ just in case.
#Merging the data
dist<-read.csv(file.choose(), header=T) dw<-merge(dw, dist, by='district') #re-coding the district data to numerical tmp1<-ifelse(dw$district=='Boston (1)', 1, ifelse(dw$district=='New York (2)', 2, ifelse(dw$district=='Philadelphia (3)', 3, ifelse(dw$district=='Cleveland (4)', 4, ifelse(dw$district=='Richmond (5)', 5, ifelse(dw$district=='Atlanta (6)', 6, ifelse(dw$district=='Chicago (7)', 7, ifelse(dw$district=='St. Louis (8)', 8, ifelse(dw$district=='Minneapolis (9)', 9, ifelse(dw$district=='Kansas City (10)', 10, ifelse(dw$district=='Dallas (11)', 11, ifelse(dw$district=='San Francisco (12)', 12, 'Error')))))))))))) dw$dist.no<-as.numeric(tmp1) #Isolating the names, making to factor tmp2<-ifelse(dw$district=='Boston (1)', 'Boston', ifelse(dw$district=='New York (2)', 'New York', ifelse(dw$district=='Philadelphia (3)', 'Philadelphia', ifelse(dw$district=='Cleveland (4)', 'Cleveland', ifelse(dw$district=='Richmond (5)', 'Richmond', ifelse(dw$district=='Atlanta (6)', 'Atlanta', ifelse(dw$district=='Chicago (7)', 'Chicago', ifelse(dw$district=='St. Louis (8)', 'St. Louis', ifelse(dw$district=='Minneapolis (9)', 'Minneapolis', ifelse(dw$district=='Kansas City (10)', 'Kansas City', ifelse(dw$district=='Dallas (11)', 'Dallas', ifelse(dw$district=='San Francisco (12)', 'San Francisco', 'Error')))))))))))) dw$dist.city<-as.factor(tmp2)
Previous Posts (Part 1, Part 2, Part 3, Part 4)
To leave a comment for the author, please follow the link and comment on their blog: OutLie..R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.