Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Discovering 18th century London with new s
Greetings, humanists, social and data scientists!
In this lesson, we delve into the world of data visualization with a focus on the use of different s in ggplot2. We are going to learn how to take advantage of Google Fonts to enrich our plots. Our journey takes us back in time, exploring a dataset derived from the “London Lives 1690-1800: Crime, Poverty and Social Policy in the Metropolis” project. This dataset provides a window into the socio-economic and health conditions of 18th-century.
Data source
The data used in this tutorial consists of a catalogue of historical documents organized by Sharon Howard. These documents detail an extensive range of Westminster inquests conducted between 1760 and 1799. They document investigations into deaths under circumstances that were sudden, unexplained, or suspicious.
The data is derived from a larger project, London Lives 1690-1800: Crime, Poverty and Social Policy in the Metropolis, a rich resource shedding light on the lives of ordinary Londoners during this era. The dataset we explore comprises 2,894 inquest records, each containing detailed information such as the dates of the inquests, names of the deceased, the verdicts rendered, and the causes of death. Sharon Howard’s contribution in organizing these records offers an invaluable opportunity to study the social and legal intricacies of historic London.
1. How to load a tsv file in R?
TSV stands for ‘Tab-Separated Values,’ a textual format used to store tabular data. In this lesson, our data is stored in TSV format. To load it, we use the read_tsv
function from the readr
package. After loading the data, we need to eliminate observations where the verdict is undefined.
Furthermore, the original data categorizes three types of suicide: ‘suicide (delirious)’, ‘suicide (felo de se)’, and ‘suicide (insane)’. However, for our purposes, we will classify all these as simply ‘suicide’. The code below execute these steps. Note that we use filter
, mutate
and recode
from the dplyr
package. The recode
function takes as argument the column to be recoded and each value associated with its new content. The packages needed for this lesson are called at the beginning of the code. Please install them with install.packages()
in case you don’t have them.
content_copy Copy
2. Using the table function to summarize a categorical variable
An efficient method for counting the number of each type of verdict in the verdict
variable is to use the table
function. While this function is typically used to construct a contingency table for two variables, in this instance, it is utilized to tally the frequency of each category within the verdict
variable. We then transform this table into a data frame to facilitate its use in ggplot2
.
content_copy Copy
3. Change s in ggplot2 with showtext
The showtext package, developed by Yixuan Qiu, greatly simplifies the use of various types in R plots. In this lesson, we’ll focus on using showtext
to access Google Fonts, though the package offers more than just that capability.
After loading the library, you can select your desired Google Font using the _add_google
function. The showtext_auto
command then instructs R to employ showtext
for text rendering. That’s it — next, simply specify your chosen in the ggplot2 theme, as demonstrated in the code below. Since our table_ver
data is already aggregated, we need to set stat = “identity” in the geom_bar layer. It’s important to note the use of reorder
to arrange the bars from largest to smallest frequency. The figures illustrate the resulting plots, first with the ‘Montserrat’ followed by the ‘Dancing Script’ .
content_copy Copy
The plot above suggests a relatively low number of homicides in these inquests. One could investigate, if indeed the number is so low or if maybe the inquest’s documentation suggest the possibility of homicides being wrongly classified as accidental.
4. A shiny app to choose your Google Font
To streamline the process of selecting the perfect for your visualizations, I have created a user-friendly Shiny App. This tool allows you to experiment with a variety of Google Fonts on the plots discussed in this lesson, enabling you to discover the that best enhances your data visualizations. You are welcome to explore this app and find the that best suits your needs.
In a future lesson I will explain how to code a Shiny App applied to digital humanities. Meanwhile, feel free to consult my code here.
5. Checking the causes of death
In the plots above, we explored the verdict that resulted from the inquests, but the dataset is richer in details. Let us explore a little more?
The variable cause_of_death
tells us more details of the circumstances in which the person in each inquest died. To visualize the most frequent causes of death, we will start by elimintating NA
observations in this variable. We can do that with filter(!is.na())
, which filters only values that are not NA
. Following that command, we group observations by cause_of_death
and count the frequency of each cause with tally
.
The last step is to filter only causes of death that occurred more than 15 times in our dataset. Please, decrease this number if you would like to see more causes. Finally, we can use the same ggplot2 code to plot the causes of death. Since the texts representing the causes of death are longer than the verdicts, we will plot the them in the y axis and the frequency in the x axis. The remaining code remains very similar.
content_copy Copy
It’s noteworthy that a significant number of the inquests pertained to cases of drowning or suicide. An intriguing avenue for further research would be to delve into the complete reports of these inquests. Such an investigation could shed light on the reasons behind the high incidence of drownings.
Please, feel free to leave any questions or concerns you might have in the comments below.
Conclusions
- Using Google Fonts within ggplot2 not only transforms but significantly elevates the aesthetic appeal of our data visualizations;
- Leveraging the showtext package simplifies incorporating diverse Google Fonts into R, enhancing our data representation capabilities;
- Employing simple plots with historical data effectively generates insightful hypotheses, offering a deeper understanding of our past.;
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.