Specify additional aesthetics for points
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
ggplot2 implements the grammar of graphics to map attributes from a data set to plot features through aesthetics. This framework can be used to adjust the point size
, color
and transparency alpha
of points in a scatter plot.
- Add additional plotting dimensions through aesthetics
- Adjust the point size of a scatter plot using the
size
parameter - Change the point color of a scatter plot using the
color
parameter - Set a parameter
alpha
to change the transparency of all points - Differentiate between aesthetic mappings and constant parameters
ggplot(___) + geom_point( mapping = aes(x = ___, y = ___, color = ___, size = ___), alpha = ___ )
Adding more plot aesthetics
In their most basic form scatter plots can only visualize datasets in two dimensions through the x
and y
aesthetics of the geom_point()
layer. However, most data sets have more than two variables and thus might require additional plotting dimensions. ggplot()
makes it very easy to map additional variables to different plotting aesthetics like size
, transparency alpha
and color
.
Let’s consider the gapminder_2007
dataset which contains the variables GDP per capita gdpPercap
and life expectancy lifeExp
for 142 countries in the year 2007:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp))
Mapping the continent
variable through the point color
aesthetic and the population pop
(in millions) through the point size
we obtain a much richer plot including 4 different variables from the data set:
Quiz: geom_point() Aesthetics
Which aesthetics can be specified forgeom_point()
?
geom_line
color
point
alpha
size
Adjusting point color
ggplot(___) + geom_point( mapping = aes(x = ___, y = ___, color = ___, size = ___), alpha = ___ )
Typically, the point color is used to introduce a new dimension to a scatter plot. In ggplot we use the color
aesthetic to specify the mapping of a variable to the color of the points.
For the gapminder_2007
dataset we can plot the GDP per capita gdpPercap
vs. the life expectancy lifeExp
as follows:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp))
To color each point based on the continent
of each country we can use:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp, color = continent))
We see that in the resulting plot each point is colored differently based on the continent
of each country. ggplot
uses the coloring scheme based on the categorical data type of the variable continent
.
By contrast, let’s see how the plot looks like if we color the points by the numeric
variable population pop
:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp, color = pop))
The scale immediately changes to continuous as it can be seen in the legend and the light-blue points are now the countries with the highest population number (China and India).
Exercise: Reconstruct Gapminder graph
Reconstruct the following graph which shows the relationship between GDP per capita and life expectancy for the year 2007:
- Use the
ggplot()
function and specify thegapminder_2007
dataset as input - Add a
geom_point
layer to the plot and create a scatter plot showing the GDP per capitagdpPercap
on the x-axis and the life expectancylifeExp
on the y-axis - Make the
color
aesthetic of the points unique for eachcontinent
Exercise: Create a colored scatter plot with DavisClean
The DavisClean
dataset contains the height and weight measurements of 199 people.
- Use the
ggplot()
function and specify theDavisClean
dataset as input - Add a
geom_point()
layer to the plot and create a scatter plot showing theweight
on the x- and theheight
on the y-axis - Make the
color
aesthetic of the points unique by thesex
of each individual.
Adjusting point size
ggplot(___) + geom_point( mapping = aes(x = ___, y = ___, color = ___, size = ___), alpha = ___ )
For the gapminder_2007
dataset we can plot the GDP per capita gdpPercap
vs. the life expectancy as follows:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp))
To adjust the point size based on the population (pop
) of each country we can use:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp, size = pop))
We see that the point sizes in the plot above do not clearly reflect the population differences in each country. If we compare the point size representing a population of 250 million people with the one displaying 750 million, we can see, that their sizes are not proportional. Instead, the point sizes are binned by default. To reflect the actual population differences by the point size we can use the scale_size_area()
function instead. The scaling information can be added like any other ggplot object with the +
operator:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp, size = pop)) + scale_size_area(max_size = 10)
Note that we have adjusted the point’s max_size
which results in bigger point sizes.
Exercise: Create a Gapminder scatter plot using size
Create a scatter plot with ggplot2 which shows the relationship between GDP per capita and life expectancy for the year 2007 using the gapminder_2007
dataset.
- Use the
ggplot()
function and specify thegapminder_2007
dataset as input - Add a
geom_point()
layer to the plot and create a scatter plot showing the GDP per capitagdpPercap
on the x-axis and the life expectancylifeExp
on the y-axis - Use the
size
aesthetic to adjust the point size by the populationpop
- Use the
scale_size_area()
function so that the point sizes reflect actual population differences and set themax_size
of each point to10
Setting global aesthetics: transparency
ggplot(___) + geom_point( mapping = aes(x = ___, y = ___, color = ___, size = ___), alpha = ___ )
Plotting many points with similar x- and y-coordinates in one graph can produce dense point clouds. Many points in these clouds are over plotted and the true number of observations in a certain area is not visible any more. As a solution, we can set the transparency of each point using the ggplot parameter alpha
.
Since we do not want to set the point transparency individually for each point but globally for all points we do not set the alpha
parameter as an aesthetic mapping (within aes()
) but outside.
We set the opacity of each point to 50% through the parameter alpha
outside as a constant parameter:
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp, size = pop), alpha = 0.5)
We can now clearly see how many points are overlapping each other and the opacity of each point is set to 0.5
.
Quiz: Gapminder Plot
ggplot(gapminder_2007) + geom_point(aes(x = gdpPercap, y = lifeExp, size = pop, alpha = 0.5, color = "red"))Which statements about the plot above are correct?
- Constant plot parameters should be set outside of an aesthetic mapping
aes()
. - The reason for the legend entries
alpha
andcolor
are that they are set as aesthetic mappings instead of global parameters. - The parameter
lifeExp
should be set as a global parameter. - The parameter
gdpPercap
should be set as a global parameter.
Exercise: Reproduce Gapminder scatter plot
Try to reproduce the following plot:
- Use the
ggplot()
function and specify thegapminder_2007
dataset as input - Add a
geom_point
layer to the plot and create a scatter plot showing the GDP per capitagdpPercap
on the x-axis and the life expectancylifeExp
on the y-axis - Use the
color
aesthetic to indicate eachcontinent
by a different color - Use the
size
aesthetic to adjust the point size by the populationpop
- Use
scale_size_area()
so that the point sizes reflect the actual population differences and set themax_size
of each point to15
- Set the opacity/transparency of each point to 70% using the
alpha
parameter
Specify additional aesthetics for points is an excerpt from the course Introduction to R, which is available for free at quantargo.com
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.