Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I was recently confronted to a situation that required going from a long dataset to a wide dataset, but with a small twist: there were two datasets, which I had to merge into one. You might wonder what kinda crappy twist that is, right? Well, let’s take a look at the data:
data1; data2 ## # A tibble: 20 x 4 ## country date variable_1 value ## <chr> <chr> <chr> <int> ## 1 lu 01/01/2005 maybe 22 ## 2 lu 01/07/2005 maybe 13 ## 3 lu 01/01/2006 maybe 40 ## 4 lu 01/07/2006 maybe 25 ## 5 lu 01/01/2005 totally_agree 42 ## 6 lu 01/07/2005 totally_agree 17 ## 7 lu 01/01/2006 totally_agree 25 ## 8 lu 01/07/2006 totally_agree 16 ## 9 lu 01/01/2005 totally_disagree 39 ## 10 lu 01/07/2005 totally_disagree 17 ## 11 lu 01/01/2006 totally_disagree 23 ## 12 lu 01/07/2006 totally_disagree 21 ## 13 lu 01/01/2005 kinda_disagree 69 ## 14 lu 01/07/2005 kinda_disagree 12 ## 15 lu 01/01/2006 kinda_disagree 10 ## 16 lu 01/07/2006 kinda_disagree 9 ## 17 lu 01/01/2005 kinda_agree 38 ## 18 lu 01/07/2005 kinda_agree 31 ## 19 lu 01/01/2006 kinda_agree 19 ## 20 lu 01/07/2006 kinda_agree 12 ## # A tibble: 20 x 4 ## country date variable_2 value ## <chr> <chr> <chr> <int> ## 1 lu 01/01/2005 kinda_agree 22 ## 2 lu 01/07/2005 kinda_agree 13 ## 3 lu 01/01/2006 kinda_agree 40 ## 4 lu 01/07/2006 kinda_agree 25 ## 5 lu 01/01/2005 totally_agree 42 ## 6 lu 01/07/2005 totally_agree 17 ## 7 lu 01/01/2006 totally_agree 25 ## 8 lu 01/07/2006 totally_agree 16 ## 9 lu 01/01/2005 totally_disagree 39 ## 10 lu 01/07/2005 totally_disagree 17 ## 11 lu 01/01/2006 totally_disagree 23 ## 12 lu 01/07/2006 totally_disagree 21 ## 13 lu 01/01/2005 maybe 69 ## 14 lu 01/07/2005 maybe 12 ## 15 lu 01/01/2006 maybe 10 ## 16 lu 01/07/2006 maybe 9 ## 17 lu 01/01/2005 kinda_disagree 38 ## 18 lu 01/07/2005 kinda_disagree 31 ## 19 lu 01/01/2006 kinda_disagree 19 ## 20 lu 01/07/2006 kinda_disagree 12
As explained in Hadley (2014), this is how you should keep your data… But for a particular purpose, I had to transform these datasets. What I was asked to do was to merge these into a single wide data frame. Doing this for one dataset is easy:
data1 %>% spread(variable_1, value) ## # A tibble: 4 x 7 ## country date kinda_agree kinda_disagree maybe totally_agree ## * <chr> <chr> <int> <int> <int> <int> ## 1 lu 01/01/2005 38 69 22 42 ## 2 lu 01/01/2006 19 10 40 25 ## 3 lu 01/07/2005 31 12 13 17 ## 4 lu 01/07/2006 12 9 25 16 ## # ... with 1 more variables: totally_disagree <int>
But because data1
and data2
have the same levels for variable_1
and variable_2
, this would not work. So the solution I found online, in this SO thread was to use tidyr::spread()
with dplyr::rename_at()
like this:
data1 <- data1 %>% spread(variable_1, value) %>% rename_at(vars(-country, -date), funs(paste0("variable1:", .))) glimpse(data1) ## Observations: 4 ## Variables: 7 ## $ country <chr> "lu", "lu", "lu", "lu" ## $ date <chr> "01/01/2005", "01/01/2006", "01/07/... ## $ variable1:kinda_agree <int> 38, 19, 31, 12 ## $ variable1:kinda_disagree <int> 69, 10, 12, 9 ## $ variable1:maybe <int> 22, 40, 13, 25 ## $ variable1:totally_agree <int> 42, 25, 17, 16 ## $ variable1:totally_disagree <int> 39, 23, 17, 21 data2 <- data2 %>% spread(variable_2, value) %>% rename_at(vars(-country, -date), funs(paste0("variable2:", .))) glimpse(data2) ## Observations: 4 ## Variables: 7 ## $ country <chr> "lu", "lu", "lu", "lu" ## $ date <chr> "01/01/2005", "01/01/2006", "01/07/... ## $ variable2:kinda_agree <int> 22, 40, 13, 25 ## $ variable2:kinda_disagree <int> 38, 19, 31, 12 ## $ variable2:maybe <int> 69, 10, 12, 9 ## $ variable2:totally_agree <int> 42, 25, 17, 16 ## $ variable2:totally_disagree <int> 39, 23, 17, 21
rename_at()
needs variables which you pass to vars()
, a helper function to select variables, and a function that will do the renaming, passed to funs()
. The function I use is simply paste0()
, which pastes a string, for example “variable1:” with the name of the columns, given by the single ‘.’, a dummy argument. Now these datasets can be merged:
data1 %>% full_join(data2) %>% glimpse() ## Joining, by = c("country", "date") ## Observations: 4 ## Variables: 12 ## $ country <chr> "lu", "lu", "lu", "lu" ## $ date <chr> "01/01/2005", "01/01/2006", "01/07/... ## $ variable1:kinda_agree <int> 38, 19, 31, 12 ## $ variable1:kinda_disagree <int> 69, 10, 12, 9 ## $ variable1:maybe <int> 22, 40, 13, 25 ## $ variable1:totally_agree <int> 42, 25, 17, 16 ## $ variable1:totally_disagree <int> 39, 23, 17, 21 ## $ variable2:kinda_agree <int> 22, 40, 13, 25 ## $ variable2:kinda_disagree <int> 38, 19, 31, 12 ## $ variable2:maybe <int> 69, 10, 12, 9 ## $ variable2:totally_agree <int> 42, 25, 17, 16 ## $ variable2:totally_disagree <int> 39, 23, 17, 21
Hope this post helps you understand the difference between long and wide datasets better, as well as dplyr::rename_at()
!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.