Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In between client work, John and I have been busy working on our book, Practical Data Science with R, 2nd Edition. To demonstrate a toy example for the section I’m working on, I needed scatter plots of the petal and sepal dimensions of the iris
data, like so:
I wanted a plot for petal dimensions and sepal dimensions, but I also felt that two plots took up too much space. So, I thought, why not make a faceted graph that shows both:
Except — which columns do I plot and what do I facet on?
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa
Here’s one way to create the plot I want, using the cdata
package along with ggplot2
.
First, load the packages and data:
library("ggplot2") library("cdata") iris <- data.frame(iris)
Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the x
and y
columns of the plot (call these the value columns of the data frame) and the column that I am faceting by (call this the key column of the data frame). And I also need to specify how the key and value columns relate to the existing columns of the original data frame.
Here’s what the control table looks like:
The control table specifies that the new data frame will have the columns flower_part
, Length
and Width
. Every row of iris
will produce two rows in the new data frame: one with a flower_part
value of Petal
, and another with a flower_part
value of Sepal
. The Petal
row will take the Petal.Length
and Petal.Width
values in the Length
and Width
columns respectively. Similarly for the Sepal
row.
Here I create the control table in R, using the convenience function wrapr::build_frame()
to create the controlTable
data frame in a legible way.
(controlTable <- wrapr::build_frame( "flower_part", "Length" , "Width" | "Petal" , "Petal.Length", "Petal.Width" | "Sepal" , "Sepal.Length", "Sepal.Width" ))
## flower_part Length Width ## 1 Petal Petal.Length Petal.Width ## 2 Sepal Sepal.Length Sepal.Width
Now I apply the transform to iris
using the function rowrecs_to_blocks()
. I also want to carry along the Species
column so I can color the scatterplot points by species.
iris_aug <- rowrecs_to_blocks( iris, controlTable, columnsToCopy = c("Species")) head(iris_aug)
## Species flower_part Length Width ## 1 setosa Petal 1.4 0.2 ## 2 setosa Sepal 5.1 3.5 ## 3 setosa Petal 1.4 0.2 ## 4 setosa Sepal 4.9 3.0 ## 5 setosa Petal 1.3 0.2 ## 6 setosa Sepal 4.7 3.2
And now I can create the plot!
ggplot(iris_aug, aes(x=Length, y=Width)) + geom_point(aes(color=Species, shape=Species)) + facet_wrap(~flower_part, labeller = label_both, scale = "free") + ggtitle("Iris dimensions") + scale_color_brewer(palette = "Dark2")
In the next post, I will show how to use cdata
and ggplot2
to create a scatterplot matrix.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.