Use of the apply family of functions

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In this post I will talk about the use of the R functions apply(), lapply(), sapply(), tapply(), and vapply() with examples.

These functions are all designed to help users apply a function to a set of data in R, but they differ in their input and output types, as well as in the way they handle missing values and other complexities. By using the right function for your particular problem, you can make your code more efficient and easier to read.

Let’s start with the basics.

The Basics

Before we dive into the details of each function, let’s define some terms:

  • A vector is a one-dimensional array of data, like a list of numbers or strings.
  • A matrix is a two-dimensional array of data, like a table of numbers.
  • A data frame is a two-dimensional object that can hold different types of data, like a spreadsheet.
  • A list is a collection of objects, which can be of different types, like a shopping bag full of different items.

Each of the five functions we’ll discuss here takes a list as input (although some can also take vectors or matrices). Let’s create a list object to use in our examples:

my_list <- list(
  a = c(1, 2, 3),
  b = matrix(1:6, nrow = 2),
  c = data.frame(x = 1:3, y = c("a", "b", "c")),
  d = c(4, NA, 6),
  e = list("foo", "bar", "baz")
)

This list contains five elements:

  • A vector of numbers (a)
  • A matrix of numbers (b)
  • A data frame with two columns (c)
  • A vector of numbers with a missing value (d)
  • A list of character strings (e)

Now that we have our data, let’s look at each of the functions in turn.

The Functions

apply()

The apply() function applies a function to the rows or columns of a matrix or array. It is most commonly used with matrices, but can also be used with higher-dimensional arrays. The function takes three arguments:

  • The matrix or array to apply the function to
  • The margin (1 for rows, 2 for columns, or a vector of dimensions)
  • The function to apply

Let’s apply the mean() function to the columns of our matrix in my_list$b:

apply(my_list$b, 2, mean)
[1] 1.5 3.5 5.5

This will return a vector of means for each column of the matrix

lapply()

The lapply() function applies a function to each element of a list and returns a list of the results. It takes two arguments:

  • The list to apply the function to
  • The function to apply

Let’s apply the class() function to each element of our list:

lapply(my_list, class)
$a
[1] "numeric"

$b
[1] "matrix" "array" 

$c
[1] "data.frame"

$d
[1] "numeric"

$e
[1] "list"

This will return a list of the classes of each element.

sapply()

The sapply() function is similar to lapply(), but it simplifies the output to a vector or matrix if possible. It takes the same two arguments as lapply():

  • The list to apply the function
  • The function to apply

Let’s apply the length() function to each element of our list using sapply():

sapply(my_list, length)
a b c d e 
3 6 2 3 3 

This will return a vector of lengths for each element.

tapply()

The tapply() function applies a function to subsets of a vector or data frame, grouped by one or more factors. It takes three arguments:

  • The vector or data frame to apply the function to
  • The factor(s) to group the data by
  • The function to apply

Let’s apply the mean() function to the elements of our vector my_list$d, grouped by whether they are missing or not:

tapply(my_list$d, !is.na(my_list$d), mean)
FALSE  TRUE 
   NA     5 

This will return a vector of means for each group where they are NOT NA.

vapply()

The vapply() function is similar to sapply(), but allows the user to specify the output type and length, making it more efficient and less prone to errors. It takes four arguments:

  • The list to apply the function to
  • The function to apply
  • The output type of the function
  • The length of the output vector or matrix

Let’s apply the length() function to each element of our list, specifying that the output type is an integer and the length is 1:

vapply(my_list, length, integer(1))
a b c d e 
3 6 2 3 3 

This will return a matrix of lengths for each element, with 1 row:

Conclusion

In this blog post, we have covered the basics of the apply(), lapply(), sapply(), tapply(), and vapply() functions in R. These functions are all useful for applying a function to a set of data in R, but they differ in their input and output types, as well as in the way they handle missing values and other complexities. By using the right function for your particular problem, you can make your code more efficient and easier to read.

Voila!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)