Site icon R-bloggers

Using the plyr package

[This article was first published on Dan Kelley Blog/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

The base R system provides lapply() and related functions, and the package plyr provides alternatives that are worth considering. It will be assumed that readers are familiar with lapply() and are willing to spend a few moments reading the plyr documentation, to see why the illustration here will use the ldply() function.

The test task will be extraction of latitude (and then both latitude and longitude) from the section dataset in the oce package. (Users of that package may be aware that there is a built-in accessor for doing this, so results can easily be checked.)

Methods

First, load the data

1
2
library(oce)
data(section)

Next, find latitudes using lapply

1
lat <- unlist(lapply(section[["station"]], function(x) x[["latitude"]]))

Next, find latitudes with ldply

1
2
library(plyr)
lat <- ldply(section[["station"]], function(x) x[["latitude"]])

Results

The reader can check that the results match, although ldply() returns a data frame, not a vector as in the first method. Tests of speed

1
2
3
library(microbenchmark)
microbenchmark(ldply(section[["station"]], function(x) x[["latitude"]])$V1, 
    unlist(lapply(section[["station"]], function(x) x[["latitude"]])))

yield the following

1
2
3
4
5
6
7
## Unit: milliseconds
##                                                               expr   min
##        ldply(section[["station"]], function(x) x[["latitude"]])$V1 18.99
##  unlist(lapply(section[["station"]], function(x) x[["latitude"]])) 18.36
##     lq median    uq   max neval
##  20.26  20.56 21.02 36.05   100
##  19.71  19.93 20.64 63.18   100

suggesting a difference too small to be of much practical interest.

Discussion

Since ldply() returns a data frame, it is more flexible than unlist(), which returns a vector. For example, the following creates a data frame with columns for lat and lon:

1
latlon <- ldply(section[["station"]], function(x) c(x[["latitude"]], x[["longitude"]]))

A station plot is produced as follows.

1
2
3
mapPlot(coastlineWorld, projection = "orthographic", orientation = c(20, -40, 
    0))
mapPoints(latlon$V2, latlon$V1, pch = "+", cex = 1/2, col = "red")

Conclusions

The effort of learning how to use the plyr package is likely to pay off in more flexible code, particularly because of the use of data frames in that package. On this theme, note that the author of plyr is developing a similar package called dplry, which centres more closely on data frames and offers many new features; see http://blog.rstudio.org/2014/01/17/introducing-dplyr/ for a blog item introducing dplyr.

To leave a comment for the author, please follow the link and comment on their blog: Dan Kelley Blog/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.