How to use lists in R
[This article was first published on R for Public Health, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the last post, I went over the basics of lists, including constructing, manipulating, and converting lists to other classes.
Knowing the basics, in this post, we’ll use the apply() functions to see just how powerful working with lists can be. I’ve done two posts on apply() for dataframes and matrics, here and here, so give those a read if you need a refresher.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Intro to apply-based functions for lists
There are a variety of apply()-based functions that can be used depending on what you want to do. The table below shows the function, what it inputs, and what it outputs:Function | Input | Output |
---|---|---|
apply | matrix | vector or matrix |
sapply | vector or list | vector or matrix |
lapply | vector or list | list |
mylist<-list(x=c(1,5,7), y=c(4,2,6), z=c(0,3,4)) mylist ## $x ## [1] 1 5 7 ## ## $y ## [1] 4 2 6 ## ## $z ## [1] 0 3 4and now we can use lapply() to find the mean of each element of the list (mean of each of the vectors x, y, and z), and output to a new list:
lapply(mylist, function(x) mean(x)) ## $x ## [1] 4.333333 ## ## $y ## [1] 4 ## ## $z ## [1] 2.333333But let’s say we wanted the result in a vector, not in a list, for whatever reason. Instead of doing the above and then converting the list into a vector (using unlist() or ldply() or whatever), we can do this directly using sapply() instead oflapply(). That’s because, as you can see in table, sapply() can take in a list as the input, and it will return a vector (or matrix). Let’s try it:
sapply(mylist, function(x) mean(x)) ## x y z ## 4.333333 4.000000 2.333333This is really great! Anytime you want to do the same thing over and over again, put all those things in a list and then use one of the apply functions. This reduces the need to run a loop, which can take a lot longer. Let’s do another example where we write our own function this time:
#write function to find the span of numbers in a vector and check if it's larger than 5 span.fun<-function(x) {(max(x)-min(x))>=5} #apply that function to the list sapply(mylist, span.fun) ## x y z ## TRUE FALSE FALSE
Creating a list using lapply()
You don’t need to have a list already created to use lapply() - in fact, lapply() can be used to make a list. This is because the key about lapply() is that it returns a list of the same length as whatever you input. For example, let’s initialize a list to have 2 empty matrices that are size 2x3. We’ll use lapply(): our input is just a vector containing 1 and 2, and the function we specify uses the matrix() function to construct a 2x3 matrix of empty cells for each element of this vector, so it returns a list of two such matrices. If instead of empty matrices we wanted to fill these matrices with random numbers, we could do that too. Check out both possibilities below.#initialize list to to 2 empty matrices of 2 by 3 list2<-lapply(1:2, function(x) matrix(NA, nrow=2, ncol=3)) list2 ## [[1]] ## [,1] [,2] [,3] ## [1,] NA NA NA ## [2,] NA NA NA ## ## [[2]] ## [,1] [,2] [,3] ## [1,] NA NA NA ## [2,] NA NA NA #initialize list to 2 matrices with random numbers from normal distribution list2<-lapply(1:2, function(x) matrix(rnorm(6, 10, 1), nrow=2, ncol=3)) list2 ## [[1]] ## [,1] [,2] [,3] ## [1,] 9.467982 9.794397 10.52168 ## [2,] 10.022561 10.179758 10.47954 ## ## [[2]] ## [,1] [,2] [,3] ## [1,] 7.990455 10.95596 11.94031 ## [2,] 8.952418 10.97080 11.24791Again, we can use lapply() or sapply() on this newly created list to get the sum of each column of each matrix:
#input list, output column sums of each matrix into a new list lapply(list2, colSums) ## [[1]] ## [1] 19.49054 19.97416 21.00121 ## ## [[2]] ## [1] 16.94287 21.92676 23.18822 #input list, output column sums into a **vector** (which binds them into a matrix) sapply(list2, colSums) ## [,1] [,2] ## [1,] 19.49054 16.94287 ## [2,] 19.97416 21.92676 ## [3,] 21.00121 23.18822 #instead of binding, we can stack these column sums by using tranpose function t(): t(sapply(list2, colSums)) ## [,1] [,2] [,3] ## [1,] 19.49054 19.97416 21.00121 ## [2,] 16.94287 21.92676 23.18822
Practical uses of lists using lapply()
Finally, what are lists good for? Often, I find a lists are great when I want to store multi-dimensional objects into one object, for example group a bunch of data.frames into a list, or store all my model results into one list. Here’s an example, where I run four linear models for four different outcomes. I want to store all my models into one object. There are two ways to do this:- Use a for() loop and insert the results of each iteration into the list
- Use lapply! Faster and less code
#create some data set.seed(2000) x=rbinom(1000,1,.6) mydata<-data.frame(trt=x, out1=x*3+rnorm(1000,0,3), out2=x*5+rnorm(1000,0,3), out3=rnorm(1000,5,3), out4=x*1+rnorm(1000,0,8)) head(mydata) ## trt out1 out2 out3 out4 ## 1 1 1.496148 5.2140842 7.8220283 12.7108382 ## 2 0 -1.243485 0.5332667 2.8407921 4.6709677 ## 3 1 11.070722 4.6477594 4.6725192 0.4216170 ## 4 1 2.681000 1.8717883 0.3333281 0.4401036 ## 5 0 -3.459300 0.8945582 3.1010555 -0.2620342 ## 6 1 -2.266221 9.1754452 6.4914437 3.0443185Now I want to run each of the four outcomes on the trt variable using linear regression and save the results. I’ll do this first as a loop, then using lapply():
#1. Use a loop #first, initialize the results list results<-vector("list", 4) #now use a loop for each outcome for(i in 1:4){ results[[i]]<-lm(mydata[,i+1]~trt, data = mydata) } #2.Or, use lapply in one statement! results<-lapply(2:5, function(x) lm(mydata[,x]~trt, data = mydata))In the second case, we are taking the vector c(2,3,4,5) and for each component of this vector, we’re running the model that we describe in the function. We can always name the components of the list as below, and I’ll print out the first two elements:
names(results)<-names(mydata)[2:5] print(results, max=2) ## $out1 ## ## Call: ## lm(formula = mydata[, x] ~ trt, data = mydata) ## ## Coefficients: ## (Intercept) trt ## 0.1905 2.7707 ## ## ## $out2 ## ## Call: ## lm(formula = mydata[, x] ~ trt, data = mydata) ## ## Coefficients: ## (Intercept) trt ## -0.01892 4.73405 ## ## ## [ reached getOption("max.print") -- omitted 2 entries ]Why is this a great way to store data? Well, we can keep using the apply() functions, for example to put together all of the treatment effects for each outcome into one matrix:
#extract coefficient and std error for each outcome and store in a matrix sapply(results, function(x) summary(x)$coefficients[2,1:2]) ## out1 out2 out3 out4 ## Estimate 2.7707490 4.7340543 -0.1344969 1.3293520 ## Std. Error 0.1915748 0.1876549 0.1912755 0.5324664You can also easily use other functions like stargazer() (previous post on this function here) to create a quick table of results like so (in latex code):
require(stargazer) stargazer(results, column.labels=names(results), keep.stat=c("rsq","n"), dep.var.labels="")Or easily create a graph of the model estimates and 95% confidence intervals:
#extract coefficients from the list coefs<-as.data.frame(t(sapply(results, function(x) summary(x)$coefficients[2,1:2]))) coefs ## Estimate Std. Error ## out1 2.7707490 0.1915748 ## out2 4.7340543 0.1876549 ## out3 -0.1344969 0.1912755 ## out4 1.3293520 0.5324664 #add outcome columnn and change name of SE column coefs$Outcome<-rownames(coefs) names(coefs)[2]<-"SE" #use ggplot to plot all the estimates require(ggplot2) ggplot(coefs, aes(Outcome,Estimate)) + geom_point(size=4) + theme(legend.position="none")+ labs(title="Treatment effect on outcomes", x="", y="Estimate and 95% CI")+ geom_errorbar(aes(ymin=Estimate-1.96*SE,ymax=Estimate+1.96*SE),width=0.1)+ geom_hline(yintercept = 0, color="red")+ coord_flip()I hope that was useful! There are many great ways to use lists and the apply() functions to make your programming more efficient and less prone to errors. For another great resource on using the apply() functions with lists, definitely check out this StackOverflow page.
To leave a comment for the author, please follow the link and comment on their blog: R for Public Health.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.