Book Review – ggplot 2: Elegant Graphics for Data Analysis by Hadley Wickham (Springer 2009)

[This article was first published on Software for Exploratory Data Analysis and Statistical Modelling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This book is written by the author of the ggplot2 package for R, which is a package with a design inspired by the grammar of graphics and can remove some of the effort required to put together impressive graphs. The book is just under 200 pages and covers a decent range of material to introduce new and experienced R users to the ggplot2 package.

The first chapter is a short introduction to the ggplot2 package and discusses how it fits in with the other approaches to creating graphics in R.

The second chapter covers usage of the qplot function and is intended to allow people to hit the ground running. As mentioned by the author a large amount of functionality is available through this function and it shields the inexperienced users from the full details of the grammar of graphics. The running example using data on diamond quality covers various common components of a graphical displays that most users would be interesting in reading about. The chapter is a good introductory tour of the facilities available with ggplot2.

Chapter three moves from using qplot to the ggplot function for creating graphics and the concept of adding components to the graph by the + operator, and the nice feature that a graph can be saved as an object and added to at a later day piece by piece. The automatic creation of legends is one area where ggplot2 scores highly compared to other graphics system although it will not be clear to novice users how to fine tune various aspects of the display – though it is questionable whether they should be tweaking ever last detail.

The next chapter (four) condesense a large amount of information about the layers that are used to make up different display types into a short space via tables. The concepts of geoms and stats are important for the system and the chapter might possible be best read after working through other examples in the book.

Chapter five, titled the toolbox, discusses a wide range of graphics that a user might be interested in creating ranging from distributions to surface plots providing a good description and reference point for determining how to create the type of graph of interest. The chapter continues the process of demonstrating how to build plots up layer by layer using the grammar of graphics approach.

In chapter six there is a long and detailed coverage of the setting up the axis scales and how to customise them to avoid over plotting with too many numbers of the axes as well as transformations such as logarithms which are commonly used in various applications. The chapter ends with a discussion of the automatic creation of legends by ggplot2 and highlights the fact that not having too much control over the appearance simplifies creation of a legend rather than introducing undesirable restrictions on the user. In most cases the defaults will be sufficient and compared to other graphics approaches in R is something to be applauded.

The use of facets is covered in chapter seven which is the equivalent of trellising in the lattice graphics. There are some good examples provided to show how to create facets with one or two variables and coverage of working with scales and axes over multiple facets.

Chapter eight provides a brief summary of the use of themes to determine the visual style of ggplot2 graphs and also discusses exporting graphs to be included in documents outside of R. This chapter might benefit form more examples of customising the themes as this is an area where people might want to know how to do more. In chapter nine there is a short introduction to another package created by the author of ggplot2, the plyr package for manipulating data more easily than with standard R approaches. There is a good example for plotting fitted models on top of data and including confidence limits on the graph as well.

The book ends with a short chapter ten providing a few tips on reducing duplication when coding in R and feels slightly out of place.

Overall Comment: This is a well presented book that provides a good introduction to the ggplot2 package for R and is a good compliment to the online help provided by the author.

To leave a comment for the author, please follow the link and comment on their blog: Software for Exploratory Data Analysis and Statistical Modelling.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)