Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Though ggplot()
makes beautiful graphics, I often find myself going back to old projects to find a template for how to set up the code to make ggplot()
graphs. Here I will show you how to make a barplot in ggplot. Then we will look at some variants of the barplot that are useful when visualizing different types of data.
Data Preparation
As with any data project, the data isn’t usually ready for plotting right out of the box. Here I use the climate data from the Berkely Earth climate change data which records the temperature on different days in various cities across the globe from 1743 through 2013. If you have data that is already formatted for plotting and you’re not interested in these steps, skip down to the “Plotting” section to dive right into the plots.
First, read the data which we store as df
and filter out rows with missing data (NAs).
df <- read.csv("Data/GlobalLandTemperaturesByCity.csv") df <- df %>% filter(!is.na(AverageTemperature))
Then format the dates. Take the dt
column and format it as a date. Then we will make a new column, Year
, that contains only the year so that we can filter by year.
df$dt <- as.Date(df$dt) df$Year <- format(df$dt, format="%Y") df$Year <- as.numeric(df$Year)
Next we will summarize the data to get the average temperature in each city by year. To accomplish this we use the dplyr
package to group the data by Year
and City
. Then we use the summarise()
function (also from dplyr
) to take the average temperature within each group. So that we don’t lose the data on country, latitude, and longitude we take the first instance of each.
df_yearly <- df %>% group_by(Year, City) %>% dplyr::summarise(AverageTemp = mean(AverageTemperature, na.rm=T), Country = first(Country), Latitude = first(Latitude), Longitude = first(Longitude))
To summarize the data by continent we need to find out which continent each country is on. The package countrycode
can accomplish this task with ease. Convert the df_yearly
tibble that we created in the step above into a dataframe. The differences between a tibble and dataframe are nuanced, but for now, the countrycode
package doesn’t handle tibbles well. The next step feeds the Country
column from df_yearly
into countrycode()
and extracts the continent for each country.
require(countrycode) df_yearly <- data.frame(df_yearly) #countrycode package doesn't handle tibbles well df_yearly$continent <- countrycode(sourcevar = df_yearly[, "Country"], origin = "country.name", destination = "continent") df_yearly$continent <- as.factor(df_yearly$continent)
The last step in data processing is to summarize by continent. Here we use the dplyr
functions filter, group_by
, and summarise to take our yearly average data, filter out two years: 1913 and 2013, then group for each continent and each year (1913 and 2013) summarize the average and standard deviation of temperature as well as the number of cities on that continent that went into the average.
df_continent <- df_yearly %>% filter(Year == 1913 | Year == 2013) %>% group_by(continent, Year) %>% dplyr::summarise(mean_temp = mean(AverageTemp), std_temp = sd(AverageTemp), n_cities = n())
Plotting
It’s finally time to start plotting. We will start with a basic barplot in ggplot
and then move on to some useful variants.
The structure for any ggplot
graph is similar: ggplot(data, aes(x, y, fill)) + geometry
. Here we fill in the dataframe, x variable (continent), y variable (average temperature), and fill (year). The critical part of code to make a barplot as opposed to other kinds of plots is the geom_bar()
function. Here we add the argument position = position_dodge()
which puts the grouped bars side by side (instead of stacked). If we had data that did not have groups then we would simply use geom_bar(stat = "identity")
.
temp_ct_vert <- ggplot(data = df_continent, aes(x = continent, y = mean_temp, fill = as.factor(Year))) + geom_bar(stat = "identity", color = "black", position = position_dodge()) + labs(x = "Continent", y = "Mean Yearly Temperature", fill = "Year") + theme_minimal() + theme(text = element_text(size = 20)) temp_ct_vert
Another option is to arrange the groups on the x-axis by something other than alphabetical order. Here we sort by the average temperature from highest to lowest using reorder(continent, -mean_temp
).
temp_ct_vert_sort <- ggplot(data = df_continent, aes(x = reorder(continent, -mean_temp), y = mean_temp, fill = as.factor(Year))) + geom_bar(stat = "identity", color = "black", position = position_dodge()) + labs(x = "Continent", y = "Mean Yearly Temperature", fill = "Year") + theme_minimal() + theme(text = element_text(size = 20)) temp_ct_vert_sort
We can add numbers to the plots. Here we show the number of cities on each continent in each year. We do this using geom_text()
with an argument position = position_dodge()
so that the numbers show up on side by side grouped bars.
temp_ct_vert_num <- ggplot(data = df_continent, aes(x = continent, y = mean_temp, fill = as.factor(Year))) + geom_bar(stat = "identity", color = "black", position = position_dodge()) + geom_text(aes(label = n_cities), vjust = 1.6, color = "black", position = position_dodge(0.9), size = 3.5) + labs(x = "Continent", y = "Mean Yearly Temperature", fill = "Year") + theme_minimal() + theme(text = element_text(size = 20)) temp_ct_vert_num
Another useful variant is to add error bars. Here the error bars show the standard deviation of the average temperature for each country in that year. We add the geom_errorbar()
function and with the aes()
command put the lower and upper extent of the error bars (mean_temp - std_temp
and mean_temp + std_temp
).
temp_ct_vert_eb <- ggplot(data = df_continent, aes(x = continent, y = mean_temp, fill = as.factor(Year))) + geom_bar(stat = "identity", color = "black", position = position_dodge()) + geom_errorbar(aes(ymin = mean_temp - std_temp, ymax = mean_temp + std_temp), width = 0.2, position = position_dodge(0.9)) + labs(x = "Continent", y = "Mean Yearly Temperature", fill = "Year") + theme_minimal() + theme(text = element_text(size = 20)) temp_ct_vert_eb
Flipping the bars to a horizontal position can make it easier to read the labels or may look clearer for some data. We do this by adding coord_flip()
. In this example we don’t re-type all of the code from temp_ct_vert
but instead just use the saved object and add coord_flip()
to it. This is a shortcut that can make it faster to make variations of plots and also make it clearer what features changed between different versions in the code.
temp_ct_horiz <- temp_ct_vert + coord_flip() temp_ct_horiz
In the final example we make a stacked barchart. The key difference is that geom_bar()
does not have position_dodge()
which leaves the bars stacked.
temp_ct_vert_stack <- ggplot(data = df_continent, aes(x = continent, y = mean_temp, fill = as.factor(Year))) + geom_bar(stat = "identity", color = "black") + labs(x = "Continent", y = "Mean Yearly Temperature", fill = "Year") + theme_minimal() + theme(text = element_text(size = 20)) temp_ct_vert_stack
Other Resources
There are many great resources for working with ggplot()
and the geom_bar()
function.
- The blog post from sthda.com walks through variants
- These two posts from the graph gallery on basic barplots and grouped barplots (one of my favorite places to get inspiration for R visualizations with beautiful graphics and easy to follow instructions)
- Examples for customizations from the Cookbook for R
- Hadley Wickham’s ggplot book
There you have it! Barplots with ggplot
. I hope that you found this post helpful or at least interesting. Please let me know if you have an R question that you would like explained on here. And thanks for following along with my R journey.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.