A Heatmap of James Lind’s Scurvy Study
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week’s TidyTuesday concerns what some have called the first randomized clinical trial- a study by James Lind evaluating various treatments for scurvy. This data has been collected into the medicaldata R package from Lind’s book on scurvy.
Loading the Libraries and Data
library(tidyverse) # who doesn't want to be tidy library(gt) # nice tables library(ggthemes) # more themes for ggplots library(RColorBrewer) # color scales library(here) # consistent method for working in file structure
Loading the TidyTuesday in the normal way.
tuesdata <- tidytuesdayR::tt_load(2023, week = 30) scurvy <- tuesdata$scurvy
Reformat and Clean the data
Take a look at the data.
scurvy %>% gt()
study_id | treatment | dosing_regimen_for_scurvy | gum_rot_d6 | skin_sores_d6 | weakness_of_the_knees_d6 | lassitude_d6 | fit_for_duty_d6 |
---|---|---|---|---|---|---|---|
1 | cider | 1 quart per day | 2_moderate | 2_moderate | 2_moderate | 2_moderate | 0_no |
2 | cider | 1 quart per day | 2_moderate | 1_mild | 2_moderate | 3_severe | 0_no |
3 | dilute_sulfuric_acid | 25 drops of elixir of vitriol, three times a day | 1_mild | 3_severe | 3_severe | 3_severe | 0_no |
4 | dilute_sulfuric_acid | 25 drops of elixir of vitriol, three times a day | 2_moderate | 3_severe | 3_severe | 3_severe | 0_no |
5 | vinegar | two spoonfuls, three times daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
6 | vinegar | two spoonfuls, three times daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
7 | sea_water | half pint daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
8 | sea_water | half pint daily | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
9 | citrus | two lemons and an orange daily | 1_mild | 1_mild | 0_none | 1_mild | 0_no |
10 | citrus | two lemons and an orange daily | 0_none | 0_none | 0_none | 0_none | 1_yes |
11 | purgative_mixture | a nutmeg-sized paste of garlic, mustard seed, horseradish, balsam of Peru, and gum myrrh three times a day | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
12 | purgative_mixture | a nutmeg-sized paste of garlic, mustard seed, horseradish, balsam of Peru, and gum myrrh three times a day | 3_severe | 3_severe | 3_severe | 3_severe | 0_no |
Small data set, maybe best as a table.
First some cleanup. First, I’m replacing all the underscores (“_”) in the treatment descriptions with spaces. I use str_replace_all()
from stringr inside a mutate()
for that.
scurvy <- scurvy %>% mutate(treatment = str_replace_all(treatment, "_", " "))
Reshaping the Data with pivot_longer()
A heatmap is like a table, but a bit more visually interesting. I’d like to plot the treatment vs. the symptoms. To do so, the data needs to be reshaped into the long format. This is easily done using pivot_longer() from tidyr. Here, I give it the name of the two new columns, symptom and severity. The symptom column will contain the values gum rot, skin sores, etc. that are currently fields. The severity column will contain the likert scale rating for that symptom (1- mild, 2- moderate, 3- severe). I’m also going to split those ratings into a numerical (1, 2, 3) and descriptive (“mild”, “moderate”, “severe”) rating using separate()
from tidyr.
scurvy2 <- scurvy %>% pivot_longer(4:7, names_to = "symptom", values_to = "severity") %>% separate( severity, sep = "_", into = c("Numerical_Severity", "Descriptive_Severity") )
Now, I want to get rid of the day 6 label (d6) from the symptom. There are no other time points, so the data is somewhat extraneous. Here I’m using separate_wider_delim()
. This is actually the active/ recommended separate function in tidyr. Both separate() and extract() still work, but it is recommended that users switch to the new family of separate functions. Instead of into for the new column names, this function uses names. The list of names can contain NA for sections that you wish to discard. Here, I throw away the second section (_d6). Then I use str_replace_all() to clean out the rest of the underscores in the symptom names. Note that if I did this step first, I’d need a different delimiter. I generally like to do any separations before I do str_replace() because you are more likely to have a unique separator/delimiter before you clean up special symbols.
scurvy2 <- scurvy2 %>% separate_wider_delim(symptom, delim = "_d6", names = c("symptom", NA)) %>% mutate(symptom = str_replace_all(symptom, "_", " "))
Making the Heatmap
I never remember the color palette names, so here they are. The par(cex = 0.5) command is a base R method of changing the parameters for the next plot. The cex parameter controls the text size. There are a lot of palettes, so if you don’t do this the labels will be squished.
Choosing a Color Palette
par(cex = 0.5) display.brewer.all()
First Draft of the Heatmap
And the first draft of the heatmap. I’m coloring the heatmap by numerical severity and I’m labeling it with the descriptive severity.
ggplot(scurvy2, aes(treatment, symptom)) + geom_tile(aes(fill = Numerical_Severity)) + geom_text(aes(label = Descriptive_Severity)) + scale_fill_brewer(palette = "YlOrRd")
Oh, that is ugly. And ggplot grouped all the treatments, so we have overlapping text labels.
Plotting by study_id
Let’s fix things up. First, plot by study_id instead of treatment group. That will get me back to the 12 patients. And the legend should be removed. This is done using theme(legend.position = "none")
. I’m also going to flip x and y. Since there are only 4 symptoms, this will give the text labels (mild, moderate, severe) more space since there will only be 4 labels across instead of 12.
scurvy_heatmap <- ggplot(scurvy2, aes(symptom, study_id)) + geom_tile(aes(fill = Numerical_Severity)) + geom_text(aes(label = Descriptive_Severity)) + scale_fill_brewer(palette = "YlOrRd") + theme(legend.position = "none") scurvy_heatmap
Mapping study_id
Back to Treatments
Now I need to get back to the treatment groups. The original data frame, scurvy, has the treatment groups matched to the patient_id. The mapping can be done using scale_x_continuous(breaks = 1:12, labels = scurvy$treatment)
. This is pretty non-intuitive to me;I found this suggestion on Stackoverflow.
scurvy_heatmap <- scurvy_heatmap + scale_y_continuous(breaks = 1:12, labels = scurvy$treatment) scurvy_heatmap
Modifying the Style with theme()
Now tweak the theme to clean things up. First, I’m going to start with a theme that is closest to want I want. In this case, I went with theme_void(), which actually removes all the elements. I then add the few I want back in. There is so much stuff to remove (the axis, the grid marks, the gray background). I want to retain the axis title and text. I’ll add the titles and captions too.
scurvy_heatmap <- scurvy_heatmap + theme_void() + theme(axis.text = element_text(size = 10), axis.title = element_text(face = 'bold')) + labs(title = "James Lind's Study of Scurvy Treatments- 1757", caption = "data from medicaldata R package", subtitle = "Results at Day 6") + ylab('Treatments') + xlab('Symptoms') + theme(axis.text.x = element_text(angle = 45)) scurvy_heatmap
theme_void()
Isn’t Completely Empty!
And the legend came back… Apparently, theme_void does include the legend, so it re-spawned. This is a bit surprising, since the description says “A completely empty theme.”
I’m going to quickly test this.
ggplot(scurvy2, aes(treatment, symptom)) + geom_tile(aes(fill = Numerical_Severity)) + theme_void()
Final Heatmap
Okay, so not a big deal to remove this again. The y axis label also rotated, so that needs to be rotated back.
scurvy_heatmap <- scurvy_heatmap + theme(legend.position = "none") + theme(axis.title.y = element_text(angle = 90)) scurvy_heatmap
ggsave(here("post", "2023-07-tidytuesday-scurvy", "thumbnail.png"))
Saving 7 x 5 in image
Thoughts about Scurvy
Despite having experimental evidence that fresh citrus was effective in treating scurvy, James Lind recommended other treatments and preparations in his book. Of particular note is a concentrate of citrus juices that could be bottled and stored indefinitely. Vitamin C is very unstable; the processing and storage removed almost all the vitamin C from the juice. This is covered in a fascinating paper by R. E. Hughes, where he prepared and measured the vitamin C levels in various treatments recommended by Lind. Hughes does touch on the issue of cider. Lind’s data seems to suggest that cider was moderately effective as a treatment. Fermentation and aging does degrade vitamin C, so the cider likely did not have more than trace amounts of vitamin C. I suspect there was still some and perhaps enough to mitigate some of the worst impacts of scurvy.
Citation
@online{e. sinks2023, author = {E. Sinks, Louise}, title = {A {Heatmap} of {James} {Lind’s} {Scurvy} {Study}}, date = {2023-07-25}, url = {https://lsinks.github.io/posts/2023-07-25-tidytuesday-scurvy/scurvy}, langid = {en} }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.