Create a scatter plot with ggplot
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Make your first steps with the ggplot2 package to create a scatter plot. Use the grammar-of-graphics to map data set attributes to your plot and connect different layers using the + operator.
- Define a dataset for the plot using the
ggplot()function - Specify a geometric layer using the
geom_point()function - Map attributes from the dataset to plotting properties using the
mappingparameter - Connect different
ggplotobjects using the+operator
library(ggplot2)
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___)
)
Introduction to scatter plots
Scatter plots use points to visualize the relationship between two numeric variables. The position of each point represents the value of the variables on the x- and y-axis. Let’s see an example of a scatter plot to understand the relationship between the speed and the stopping distance of cars:
Each point represents a car. Each car starts to break at a speed given on the y-axis and travels the distance shown on the x-axis until full stop. If we take a look at all points in the plot, we can clearly see that it takes faster cars a longer distance until they are completely stopped.
Quiz: Scatter Plot Facts
Which of the following statements about scatter plots are correct?- Scatter plots visualize the relation of two numeric variables
- In a scatter plot we only interpret single points and never the relationship between the variables in general
- Scatter plots use points to visualize observations
- Scatter plots visualize the relation of categorical and numeric variables
Specifying a dataset
library(ggplot2)
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___)
)
To create plots with ggplot2 you first need to load the package using library(ggplot2).
After the package has been loaded specify the dataset to be used as an argument of the ggplot() function. For example, to specify a plot using the cars dataset you can use:
library(ggplot2) ggplot(cars)
Note that this command does not plot anything but a grey canvas yet. It just defines the dataset for the plot and creates an empty base on top of which we can add additional layers.
Exercise: Specify the gapminder dataset
To start with a ggplot visualizing the gapminder dataset we need to:
- Load the ggplot2 package
- Load the gapminder package
- Define the
gapminderdataset to be used in the plot with theggplot()function
Specifying a geometric layer
library(ggplot2)
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___)
)
We can use ggplot’s geometric layers (or geoms) to define how we want to visualize our dataset. Geoms use geometric objects to visualize the variables of a dataset. The objects can have multiple forms like points, lines and bars and are specified through the corresponding functions geom_point(), geom_line() and geom_col():
Quiz: Scatter Plot Layers
Which geometric layer should be used to create scatter plots in ggplot2?point_geom()geom()geom_scatter()geom_point()
Creating aesthetic mappings
library(ggplot2)
ggplot(___) +
geom_point(
mapping = aes(x = ___, y = ___)
)
ggplot2 uses the concept of aesthetics, which map dataset attributes to the visual features of the plot. Each geometric layer requires a different set of aesthetic mappings, e.g. the geom_point() function uses the aesthetics x and y to determine the x- and y-axis coordinates of the points to plot. The aesthetics are mapped within the aes() function to construct the final mappings.
To specify a layer of points which plots the variable speed on the x-axis and distance dist on the y-axis we can write:
geom_point( mapping = aes(x=speed, y=dist) )
The expression above constructs a geometric layer. However, this layer is currently not linked to a dataset and does not produce a plot. To link the layer with a ggplot object specifying the cars dataset we need to connect the ggplot(cars) object with the geom_point() layer using the + operator:
ggplot(cars) +
geom_point(
mapping = aes(x=speed, y=dist)
)
Through the linking ggplot() knows that the mapped speed and dist variables are taken from the cars dataset. geom_point() instructs ggplot to plot the mapped variables as points.
The required steps to create a scatter plot with ggplot can be summarized as follows:
- Load the package ggplot2 using
library(ggplot2). - Specify the dataset to be plotted using
ggplot(). - Use the
+operator to add layers to the plot. - Add a geometric layer to define the shapes to be plotted. In case of scatter plots, use
geom_point(). - Map variables from the dataset to plotting properties through the
mappingparameter in the geometric layer.
Exercise: Visualize the “cars” dataset
Create a scatter plot using ggplot() and visualize the cars dataset with the car’s stopping distance dist on the x-axis and the speed of the car on the y-axis.
The ggplot2 package is already loaded. Follow these steps to create the plot:
- Specify the dataset through the
ggplot()function - Specify a geometric point layer with the
geom_point()function - Map the
speedto the x-axis and thedistto the y-axis withaes()
Exercise: Visualize the Gapminder dataset
Create a scatter plot using ggplot() and visualize the gapminder_2007 dataset with the GDP per capita gdpPercap on the x-axis and the life expectancy lifeExp of each country on the y-axis.
The ggplot2 package is already loaded. Follow these steps to create the plot:
- Specify the
gapminder_2007dataset through theggplot()function - Specify a geometric point layer with
geom_point(). - Map the
gdpPercapto the x-axis and thelifeExpto the y-axis withaes()
Create a scatter plot with ggplot is an excerpt from the course Introduction to R, which is available for free at quantargo.com
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.