Create a scatter plot with ggplot
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Make your first steps with the ggplot2 package to create a scatter plot. Use the grammar-of-graphics to map data set attributes to your plot and connect different layers using the +
operator.
- Define a dataset for the plot using the
ggplot()
function - Specify a geometric layer using the
geom_point()
function - Map attributes from the dataset to plotting properties using the
mapping
parameter - Connect different
ggplot
objects using the+
operator
library(ggplot2) ggplot(___) + geom_point( mapping = aes(x = ___, y = ___) )
Introduction to scatter plots
Scatter plots use points to visualize the relationship between two numeric variables. The position of each point represents the value of the variables on the x- and y-axis. Let’s see an example of a scatter plot to understand the relationship between the speed and the stopping distance of cars:
Each point represents a car. Each car starts to break at a speed given on the y-axis and travels the distance shown on the x-axis until full stop. If we take a look at all points in the plot, we can clearly see that it takes faster cars a longer distance until they are completely stopped.
Quiz: Scatter Plot Facts
Which of the following statements about scatter plots are correct?- Scatter plots visualize the relation of two numeric variables
- In a scatter plot we only interpret single points and never the relationship between the variables in general
- Scatter plots use points to visualize observations
- Scatter plots visualize the relation of categorical and numeric variables
Specifying a dataset
library(ggplot2) ggplot(___) + geom_point( mapping = aes(x = ___, y = ___) )
To create plots with ggplot2 you first need to load the package using library(ggplot2)
.
After the package has been loaded specify the dataset to be used as an argument of the ggplot()
function. For example, to specify a plot using the cars
dataset you can use:
library(ggplot2) ggplot(cars)
Note that this command does not plot anything but a grey canvas yet. It just defines the dataset for the plot and creates an empty base on top of which we can add additional layers.
Exercise: Specify the gapminder dataset
To start with a ggplot visualizing the gapminder
dataset we need to:
- Load the ggplot2 package
- Load the gapminder package
- Define the
gapminder
dataset to be used in the plot with theggplot()
function
Specifying a geometric layer
library(ggplot2) ggplot(___) + geom_point( mapping = aes(x = ___, y = ___) )
We can use ggplot’s geometric layers (or geoms) to define how we want to visualize our dataset. Geoms use geometric objects to visualize the variables of a dataset. The objects can have multiple forms like points, lines and bars and are specified through the corresponding functions geom_point()
, geom_line()
and geom_col()
:
Quiz: Scatter Plot Layers
Which geometric layer should be used to create scatter plots in ggplot2?point_geom()
geom()
geom_scatter()
geom_point()
Creating aesthetic mappings
library(ggplot2) ggplot(___) + geom_point( mapping = aes(x = ___, y = ___) )
ggplot2 uses the concept of aesthetics, which map dataset attributes to the visual features of the plot. Each geometric layer requires a different set of aesthetic mappings, e.g. the geom_point()
function uses the aesthetics x
and y
to determine the x- and y-axis coordinates of the points to plot. The aesthetics are mapped within the aes()
function to construct the final mappings.
To specify a layer of points which plots the variable speed
on the x-axis and distance dist
on the y-axis we can write:
geom_point( mapping = aes(x=speed, y=dist) )
The expression above constructs a geometric layer. However, this layer is currently not linked to a dataset and does not produce a plot. To link the layer with a ggplot
object specifying the cars
dataset we need to connect the ggplot(cars)
object with the geom_point()
layer using the +
operator:
ggplot(cars) + geom_point( mapping = aes(x=speed, y=dist) )
Through the linking ggplot()
knows that the mapped speed
and dist
variables are taken from the cars
dataset. geom_point()
instructs ggplot to plot the mapped variables as points.
The required steps to create a scatter plot with ggplot
can be summarized as follows:
- Load the package ggplot2 using
library(ggplot2)
. - Specify the dataset to be plotted using
ggplot()
. - Use the
+
operator to add layers to the plot. - Add a geometric layer to define the shapes to be plotted. In case of scatter plots, use
geom_point()
. - Map variables from the dataset to plotting properties through the
mapping
parameter in the geometric layer.
Exercise: Visualize the “cars” dataset
Create a scatter plot using ggplot()
and visualize the cars
dataset with the car’s stopping distance dist
on the x-axis and the speed
of the car on the y-axis.
The ggplot2 package is already loaded. Follow these steps to create the plot:
- Specify the dataset through the
ggplot()
function - Specify a geometric point layer with the
geom_point()
function - Map the
speed
to the x-axis and thedist
to the y-axis withaes()
Exercise: Visualize the Gapminder dataset
Create a scatter plot using ggplot()
and visualize the gapminder_2007
dataset with the GDP per capita gdpPercap
on the x-axis and the life expectancy lifeExp
of each country on the y-axis.
The ggplot2 package is already loaded. Follow these steps to create the plot:
- Specify the
gapminder_2007
dataset through theggplot()
function - Specify a geometric point layer with
geom_point()
. - Map the
gdpPercap
to the x-axis and thelifeExp
to the y-axis withaes()
Create a scatter plot with ggplot is an excerpt from the course Introduction to R, which is available for free at quantargo.com
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.