Mapping a list of functions to a list of datasets with a list of columns as arguments
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week I had the opportunity to teach R at my workplace, again. This course was the “advanced R” course, and unlike the one I taught at the end of last year, I had one more day (so 3 days in total) where I could show my colleagues the joys of the tidyverse
and R.
To finish the section on programming with R, which was the very last section of the whole 3 day course I wanted to blow their minds; I had already shown them packages from the tidyverse
in the previous days, such as dplyr
, purrr
and stringr
, among others. I taught them how to use ggplot2
, broom
and modelr
. They also liked janitor
and rio
very much. I noticed that it took them a bit more time and effort for them to digest purrr::map()
and purrr::reduce()
, but they all seemed to see how powerful these functions were. To finish on a very high note, I showed them the ultimate purrr::map()
use case.
Consider the following; imagine you have a situation where you are working on a list of datasets. These datasets might be the same, but for different years, or for different countries, or they might be completely different datasets entirely. If you used rio::import_list()
to read them into R, you will have them in a nice list. Let’s consider the following list as an example:
library(tidyverse) data(mtcars) data(iris) data_list = list(mtcars, iris)
I made the choice to have completely different datasets. Now, I would like to map some functions to the columns of these datasets. If I only worked on one, for example on mtcars
, I would do something like:
my_summarise_f = function(dataset, cols, funcs){ dataset %>% summarise_at(vars(!!!cols), funs(!!!funcs)) }
And then I would use my function like so:
mtcars %>% my_summarise_f(quos(mpg, drat, hp), quos(mean, sd, max)) ## mpg_mean drat_mean hp_mean mpg_sd drat_sd hp_sd mpg_max drat_max ## 1 20.09062 3.596563 146.6875 6.026948 0.5346787 68.56287 33.9 4.93 ## hp_max ## 1 335
my_summarise_f()
takes a dataset, a list of columns and a list of functions as arguments and uses tidy evaluation to apply mean()
, sd()
, and max()
to the columns mpg
, drat
and hp
of mtcars
. That’s pretty useful, but not useful enough! Now I want to apply this to the list of datasets I defined above. For this, let’s define the list of columns I want to work on:
cols_mtcars = quos(mpg, drat, hp) cols_iris = quos(Sepal.Length, Sepal.Width) cols_list = list(cols_mtcars, cols_iris)
Now, let’s use some purrr
magic to apply the functions I want to the columns I have defined in list_cols
:
map2(data_list, cols_list, my_summarise_f, funcs = quos(mean, sd, max)) ## [[1]] ## mpg_mean drat_mean hp_mean mpg_sd drat_sd hp_sd mpg_max drat_max ## 1 20.09062 3.596563 146.6875 6.026948 0.5346787 68.56287 33.9 4.93 ## hp_max ## 1 335 ## ## [[2]] ## Sepal.Length_mean Sepal.Width_mean Sepal.Length_sd Sepal.Width_sd ## 1 5.843333 3.057333 0.8280661 0.4358663 ## Sepal.Length_max Sepal.Width_max ## 1 7.9 4.4
That’s pretty useful, but not useful enough! I want to also use different functions to different datasets!
Well, let’s define a list of functions then:
funcs_mtcars = quos(mean, sd, max) funcs_iris = quos(median, min) funcs_list = list(funcs_mtcars, funcs_iris)
Because there is no map3()
, we need to use pmap()
:
pmap( list( dataset = data_list, cols = cols_list, funcs = funcs_list ), my_summarise_f) ## [[1]] ## mpg_mean drat_mean hp_mean mpg_sd drat_sd hp_sd mpg_max drat_max ## 1 20.09062 3.596563 146.6875 6.026948 0.5346787 68.56287 33.9 4.93 ## hp_max ## 1 335 ## ## [[2]] ## Sepal.Length_median Sepal.Width_median Sepal.Length_min Sepal.Width_min ## 1 5.8 3 4.3 2
Now I’m satisfied! Let me tell you, this blew their minds ?!
To be able to use things like that, I told them to always solve a problem for a single example, and from there, try to generalize their solution using functional programming tools found in purrr
.
If you found this blog post useful, you might want to follow me on twitter for blog post updates.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.