Learn Tidyverse: Pivot Functions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
TL:DR :
We will be using the pivot longer and pivot wider functions to change the shape of our dataframe. It is currently in a wide format, where there are multiple observations for each data point. For each individiual plant observed, sepal length and width and petal length and width and the type of species were all recorded. You may be used to using melt
and spread
from the reshape2
package but those functions are being phased out and pivot_longer
and pivot_wider
are replacing them. I’m going to start off by creating a fake dataset to work with that has 3 categories – lake, beach, and park – with counts of the number of visitors to each location since 2011.
library(tidyverse) dat. <- data.frame(year = rep(seq(2011, 2020), each = 3), location = rep(c("beach", "park", "lake"), 10), N = round(runif(30, 10,100))) head(dat.) ## year location N ## 1 2011 beach 61 ## 2 2011 park 46 ## 3 2011 lake 60 ## 4 2012 beach 38 ## 5 2012 park 41 ## 6 2012 lake 31
Currently, the data is in the long format, for each year there are 3 separate rows with counts of the number of visitors to each location. So we are going to first use the pivot_wider
function to turn it into a wide format.
dat.wider <- dat. %>% pivot_wider(names_from = location, values_from = N) dat.wider ## # A tibble: 10 x 4 ## year beach park lake ## <int> <dbl> <dbl> <dbl> ## 1 2011 61 46 60 ## 2 2012 38 41 31 ## 3 2013 23 64 69 ## 4 2014 67 98 70 ## 5 2015 97 48 39 ## 6 2016 44 59 52 ## 7 2017 88 19 26 ## 8 2018 22 43 94 ## 9 2019 17 61 62 ## 10 2020 84 56 84
The new dataset looks more like a square because it has more columns and fewer rows than our original dataframe did. Now, each row has 3 observations, the number of visitors to each location.
To turn it back to long format, we will use pivot_longer
and give the col
argument the columns we want to combine into one.
dat.wider %>% pivot_longer(cols = c("beach", "park", "lake")) ## # A tibble: 30 x 3 ## year name value ## <int> <chr> <dbl> ## 1 2011 beach 61 ## 2 2011 park 46 ## 3 2011 lake 60 ## 4 2012 beach 38 ## 5 2012 park 41 ## 6 2012 lake 31 ## 7 2013 beach 23 ## 8 2013 park 64 ## 9 2013 lake 69 ## 10 2014 beach 67 ## # ... with 20 more rows dat.wider %>% pivot_longer(cols = c("beach", "park", "lake"), names_to = "location", values_to = "N_visitors") ## # A tibble: 30 x 3 ## year location N_visitors ## <int> <chr> <dbl> ## 1 2011 beach 61 ## 2 2011 park 46 ## 3 2011 lake 60 ## 4 2012 beach 38 ## 5 2012 park 41 ## 6 2012 lake 31 ## 7 2013 beach 23 ## 8 2013 park 64 ## 9 2013 lake 69 ## 10 2014 beach 67 ## # ... with 20 more rows
In the first lines of code I only told the function which columns to combine. In the second set of code, I specified what names I wanted those columns to turn into. The names_to
argument gives the name of column that has the old column names and the values_to
argument gives the name of the column that will hold the data from the combined columns.
You may be wondering why long or wide format even matters. One reason is if you use ggplot, plotting is much easier when your data is in long format instead of wide.
dat. %>% ggplot(aes(x = year, y = N, color = location)) + geom_line()
dat.wider %>% ggplot(aes(x = year, y = beach)) + geom_line(color = 1) + geom_line(aes(x = year, y = park), color =2) + geom_line(aes(x = year, y = lake), color = 3)
In the above examples you can see that with only 3 lines of code I create a graph with 3 lines, one for each location, and colored according to location. If I used the wide data set, it takes 5 lines of code and I have to add each location separately. If you are going to do it this way, you might as well use base r plotting. Also, if you want to use more advanced ggplot functions such as facet_wrap
, having your data in the long format makes it much easier.
For more tutorials and tips like this, subscribe to our newsletter below!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.