Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
The base R system provides lapply()
and related functions, and the package plyr
provides alternatives that are worth considering. It will be assumed that readers are familiar with lapply()
and are willing to spend a few moments reading the plyr
documentation, to see why the illustration here will use the ldply()
function.
The test task will be extraction of latitude (and then both latitude and longitude) from the section
dataset in the oce
package. (Users of that package may be aware that there is a built-in accessor for doing this, so results can easily be checked.)
Methods
First, load the data
1 2 | library(oce) data(section) |
Next, find latitudes using lapply
1 | lat <- unlist(lapply(section[["station"]], function(x) x[["latitude"]])) |
Next, find latitudes with ldply
1 2 | library(plyr) lat <- ldply(section[["station"]], function(x) x[["latitude"]]) |
Results
The reader can check that the results match, although ldply()
returns a data frame, not a vector as in the first method. Tests of speed
1 2 3 | library(microbenchmark) microbenchmark(ldply(section[["station"]], function(x) x[["latitude"]])$V1, unlist(lapply(section[["station"]], function(x) x[["latitude"]]))) |
yield the following
1 2 3 4 5 6 7 | ## Unit: milliseconds ## expr min ## ldply(section[["station"]], function(x) x[["latitude"]])$V1 18.99 ## unlist(lapply(section[["station"]], function(x) x[["latitude"]])) 18.36 ## lq median uq max neval ## 20.26 20.56 21.02 36.05 100 ## 19.71 19.93 20.64 63.18 100 |
suggesting a difference too small to be of much practical interest.
Discussion
Since ldply()
returns a data frame, it is more flexible than unlist()
, which returns a vector. For example, the following creates a data frame with columns for lat and lon:
1 | latlon <- ldply(section[["station"]], function(x) c(x[["latitude"]], x[["longitude"]])) |
A station plot is produced as follows.
1 2 3 | mapPlot(coastlineWorld, projection = "orthographic", orientation = c(20, -40, 0)) mapPoints(latlon$V2, latlon$V1, pch = "+", cex = 1/2, col = "red") |
Conclusions
The effort of learning how to use the plyr
package is likely to pay off in more flexible code, particularly because of the use of data frames in that package. On this theme, note that the author of plyr
is developing a similar package called dplry
, which centres more closely on data frames and offers many new features; see http://blog.rstudio.org/2014/01/17/introducing-dplyr/ for a blog item introducing dplyr
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.