Site icon R-bloggers

Faceted Graphs with cdata and ggplot2

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In between client work, John and I have been busy working on our book, Practical Data Science with R, 2nd Edition. To demonstrate a toy example for the section I’m working on, I needed scatter plots of the petal and sepal dimensions of the iris data, like so:

I wanted a plot for petal dimensions and sepal dimensions, but I also felt that two plots took up too much space. So, I thought, why not make a faceted graph that shows both:

Except — which columns do I plot and what do I facet on?

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Here’s one way to create the plot I want, using the cdata package along with ggplot2.

First, load the packages and data:

library("ggplot2")
library("cdata")

iris <- data.frame(iris)

Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the x and y columns of the plot (call these the value columns of the data frame) and the column that I am faceting by (call this the key column of the data frame). And I also need to specify how the key and value columns relate to the existing columns of the original data frame.

Here’s what the control table looks like:

The control table specifies that the new data frame will have the columns flower_part, Length and Width. Every row of iris will produce two rows in the new data frame: one with a flower_part value of Petal, and another with a flower_part value of Sepal. The Petal row will take the Petal.Length and Petal.Width values in the Length and Width columns respectively. Similarly for the Sepal row.

Here I create the control table in R, using the convenience function wrapr::build_frame() to create the controlTable data frame in a legible way.

(controlTable <- wrapr::build_frame(
   "flower_part", "Length"      , "Width"       |
   "Petal"      , "Petal.Length", "Petal.Width" |
   "Sepal"      , "Sepal.Length", "Sepal.Width" ))
##   flower_part       Length       Width
## 1       Petal Petal.Length Petal.Width
## 2       Sepal Sepal.Length Sepal.Width

Now I apply the transform to iris using the function rowrecs_to_blocks(). I also want to carry along the Species column so I can color the scatterplot points by species.

iris_aug <- rowrecs_to_blocks(
  iris,
  controlTable,
  columnsToCopy = c("Species"))

head(iris_aug)
##   Species flower_part Length Width
## 1  setosa       Petal    1.4   0.2
## 2  setosa       Sepal    5.1   3.5
## 3  setosa       Petal    1.4   0.2
## 4  setosa       Sepal    4.9   3.0
## 5  setosa       Petal    1.3   0.2
## 6  setosa       Sepal    4.7   3.2

And now I can create the plot!

ggplot(iris_aug, aes(x=Length, y=Width)) +
  geom_point(aes(color=Species, shape=Species)) + 
  facet_wrap(~flower_part, labeller = label_both, scale = "free") +
  ggtitle("Iris dimensions") +  
  scale_color_brewer(palette = "Dark2")

In the next post, I will show how to use cdata and ggplot2 to create a scatterplot matrix.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.