Take your data frames to the next level.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
While finishing up with R-rockstar Hadley Wickham’s book (Free Book – R for Data Science), the section on model building elaborates on something pretty cool that I had no idea about – list columns.
Most of us have probably seen the following data frame column format:
df <- data.frame("col_uno" = c(1,2,3),"col_dos" = c('a','b','c'), "col_tres" = factor(c("google", "apple", "amazon")))
And the output:
df ## col_uno col_dos col_tres ## 1 1 a google ## 2 2 b apple ## 3 3 c amazon
This is an awesome way to organize data and one of R’s strong points. However, we can use list functionality to go deeper. Check this out:
library(tidyverse) library(datasets) head(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa nested <- iris %>% group_by(Species) %>% nest() ## # A tibble: 3 × 2 ## Species data ## <fctr> <list> ## 1 setosa <tibble [50 × 4]> ## 2 versicolor <tibble [50 × 4]> ## 3 virginica <tibble [50 × 4]>
Using nest
we can compartmentalize our data frame for readability and more efficient iteration. Here we can use map
from the purrr
package to compute the mean of each column in our nested data.
means <- map(nested$data, colMeans) ## [[1]] ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 5.006 3.428 1.462 0.246 ## ## [[2]] ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 5.936 2.770 4.260 1.326 ## ## [[3]] ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 6.588 2.974 5.552 2.026
Once you’re done messing around with data-ception, use unnest
to revert your data back to its original state.
head(unnest(nested)) ## # A tibble: 6 × 5 ## Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## <fctr> <dbl> <dbl> <dbl> <dbl> ## 1 setosa 5.1 3.5 1.4 0.2 ## 2 setosa 4.9 3.0 1.4 0.2 ## 3 setosa 4.7 3.2 1.3 0.2 ## 4 setosa 4.6 3.1 1.5 0.2 ## 5 setosa 5.0 3.6 1.4 0.2 ## 6 setosa 5.4 3.9 1.7 0.4
I was pretty excited to learn about this property of data.frames and will definitely make use of it in the future. If you have any neat examples of nested dataset usage, please feel free to share in the comments. As always, I’m happy to answer questions or talk data!
Kiefer Smith
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.