Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of the most useful aspects of using a programming language instead of… well, not using a programming language, is that you can write code in a way that minimizes, and ideally, eliminates the need to repeat yourself.
For instance, you can write a function to show you a frequency table, like so:
suppressMessages(library(dplyr)) create_table <- function(dataset, var){ var <- enquo(var) dataset %>% count(!!var) %>% knitr::kable() }
And can now get some fancy looking tables by simply writing:
create_table(mtcars, cyl)
cyl | n |
---|---|
4 | 11 |
6 | 7 |
8 | 14 |
If I want such tables for hundreds of columns, I can use this function and loop over the columns and
not have to write the code inside the body of the function over and over again. You’ll notice that
the function create_table()
makes use of some advanced programming techniques I have discussed
here.
There’s also an alternative way of programming with {dplyr}
, using the {{}}
construct I
discussed here, but I couldn’t get
what I’m going to show you here to work with {{}}
.
Recently, I had to create a Rmarkdown document with many sections, where each section title was a question from a survey and the content was a frequency table. I wanted to write a fuction that would create a section with the right question title, and then show the table, and I wanted to then call this function over all the questions from the survey and have my document automatically generated.
The result should look like this, but it would be a PDF instead of HTML.
Let’s first load the data and see how it looks like:
library(dplyr) library(purrr) library(readr) suppressMessages( survey_data <- read_csv( "https://gist.githubusercontent.com/b-rodrigues/0c2249dec5a9c9477e0d1ad9964a1340/raw/873bcc7532b8bad613235f029884df1d0b947c90/survey_example.csv" ) ) glimpse(survey_data) ## Rows: 100 ## Columns: 4 ## $ `Random question?` <chr> "no", "yes", "yes", "yes", … ## $ `Copy of Random question?` <chr> "yes", "yes", "no", "yes", … ## $ `Copy of Copy of Random question?` <chr> "yes", "no", "no", "yes", "… ## $ `Copy of Copy of Copy of Random question?` <chr> "yes", "yes", "no", "yes", …
Each column name is the question, and each row is one answer to the survey question. To create the document I showed above, you’d probably write something like this:
## Random question? ` ``{r} create_table(survey_data, `Random question?`) ` `` ## Copy of Random question? ` ``{r} create_table(survey_data, `Copy of Random question?`) ` `` ## Copy of Copy of Random question? ` ``{r} create_table(survey_data, `Copy of Copy of Random question?`) ` `` ## Copy of Copy of Copy of Random question? ` ``{r} create_table(survey_data, `Copy of Copy of Copy of Random question?`) ` ``
As you can see, this gets tedious very quickly, especially if you have 100’s of variables. So how to not repeat yourself? The solution has two steps; first you should try to automate what you have as much as possible. Ideally, you don’t want to have to write the complete question every time. So first, let’s replace the questions by simpler variable names:
questions <- colnames(survey_data) codes <- paste0("var_", seq(1, length(questions))) lookup <- bind_cols("codes" = codes, "questions" = questions) colnames(survey_data) <- codes
lookup
is a data frame with the questions and their respective codes:
lookup ## tibble [4, 2] ## codes chr var_1 var_2 var_3 var_4 ## questions chr Random question? Copy of Random question? Cop~
and our data now has simpler variable names:
glimpse(survey_data) ## Rows: 100 ## Columns: 4 ## $ var_1 <chr> "no", "yes", "yes", "yes", "no", NA, "no", NA, "no", "no", "no",… ## $ var_2 <chr> "yes", "yes", "no", "yes", "no", "yes", "yes", NA, "yes", NA, "n… ## $ var_3 <chr> "yes", "no", "no", "yes", "yes", "no", "no", "yes", "no", "yes",… ## $ var_4 <chr> "yes", "yes", "no", "yes", "yes", "no", "no", "yes", "no", "no",…
Doing this allows us to replace the source code of our Rmarkdown like so:
## `r lookup$questions[grepl("var_1", lookup$codes)]` ` ``{r} create_table(survey_data, var_1) ` ``
This already makes things easier, as now you only have to change var_1
to var_2
to var_3
…
the inline code gets executed and the right title (the question text) appears. But how to go
further? I don’t want to have to copy and paste this and change var_1
to var_2
etc… So the
second step of the two-step solution is to use a function called knitr_expand()
described
here. The idea of
knitr::knitr_expand()
is that it uses some Rmd source as a template, and also allows the user to define
some variables that will be replaced at compile time. Simple examples are available
here. I want to build
upon that, because I need to pass my variable (in this case var_1
for instance) to my function
create_table()
.
The solution is to write another function that uses knitr::knitr_expand()
. This is how
it could look like:
create_table <- function(dataset, var){ dataset %>% count(!!var) %>% knitr::kable() } return_section <- function(var){ a <- knitr::knit_expand(text = c("## {{question}}", create_table(survey_data, var)), question = lookup$questions[grepl(quo_name(var), lookup$codes)]) cat(a, sep = "\n") }
I needed to edit create_table()
a little bit, and remove the line var <- enquo(var)
. This
is because now, I won’t be passing a variable down to the function, but a quosure, and there is
a very good reason for it, you’ll see. return_section()
makes use of knitr_expand()
,
and the text =
argument is the template that will get expanded. {{question}}
will get
replaced by the variable I defined which is the code I wrote above to automatically get the
question text. Finally, var
will get replaced by the variable I pass to the function.
First, let’s get it running on one single variable:
return_section(quo(var_1)) ## ## Random question? ## |var_1 | n| ## |:-----|--:| ## |no | 40| ## |yes | 44| ## |NA | 16|
As you see, I had to use quo(var_1)
and not only var_1
. But apart from this, the function seems
to work well. Putting this in an Rmarkdown document would create a section with the question as
the text of the section and a frequency table as the body. I could now copy and paste this and
only have to change var_1
. But I don’t want to have to copy and paste! So the idea would be
to loop the function over a list of variables.
I have such a list already:
codes ## [1] "var_1" "var_2" "var_3" "var_4"
But it’s not a list of quosures, but a list of strings, and this is not going to work (it will return an error):
walk(codes, return_section)
(I’m using walk()
instead of map()
because return_section()
doesn’t return an object, but only
shows something on screen. This is called a side effect, and walk()
allows you to loop properly
over functions that only return side effects).
The problem I have now is to convert strings to quosures. This is possible using rlang::sym()
:
sym_codes <- map(codes, sym)
And now I’m done:
walk(sym_codes, return_section) ## ## Random question? ## |var_1 | n| ## |:-----|--:| ## |no | 40| ## |yes | 44| ## |NA | 16| ## ## Copy of Random question? ## |var_2 | n| ## |:-----|--:| ## |no | 52| ## |yes | 32| ## |NA | 16| ## ## Copy of Copy of Random question? ## |var_3 | n| ## |:-----|--:| ## |no | 46| ## |yes | 47| ## |NA | 7| ## ## Copy of Copy of Copy of Random question? ## |var_4 | n| ## |:-----|--:| ## |no | 48| ## |yes | 42| ## |NA | 10|
Putting this in an Rmarkdown source create a PDF (or Word, or HTML) document with one section per
question, and without have to do copy-pasting which is quite error-prone. Here is the final
Rmarkdown file. You’ll
notice that the last chunk has the option results = 'asis'
, which is needed for this trick
to work.
Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.