Using R to create column charts featuring 95% confidence intervals
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Daniel Weiland – Consultant Medical Microbiologist – Newcastle upon Tyne NHS Foundation Trust
Hello!
This is my first blog post for the NHS R Community, which I stumbled across in the course of my work as a consultant medical microbiologist at Newcastle upon Tyne Hospitals NHS Foundation Trust.
At work, I’ve been trying to use R to create column charts featuring 95% confidence intervals. I approached the friendly people on the NHS R Community’s Slack channel for further information and guidance.
I must add here that the Community’s Slack channel has been extremely helpful to me, as a novice R user, in solving some of the issues I’ve experienced, and highlighting R packages of potential interest. This is the first time I’ve tried to create a ReprEx and now I understand why people love (?) the mtcars database as much as they do!
Step 1: Calculate some summary statistics
I wanted to calculate some summary statistics, including the mean, and standard error or 95% confidence intervals.
Initially I came across the summary() function of Base R, which is helpful as it calculates the Min., 1st Qu., Median, Mean, 3rd Qu., and Max.
However, the summary() function of {base} R does not calculate either the standard error or the 95% confidence intervals
#calculate summary statistics for all numeric data using summary() and where(is.numeric()) mtcars %>% select(where(is.numeric)) %>% summary() #calculate summary statistics for mpg using summary() and where(is.numberic()) mtcars %>% select(mpg) %>% summary()
Then zx8754 very kindly pointed me towards a method for calculating the standard error on StackOverflow: https://stackoverflow.com/q/2676554/680068
#create stderr function stderr <- function(x, na.rm=TRUE) { if (na.rm) x <- na.omit(x) sqrt(var(x)/length(x)) }
Then I used this function to calculate summary statistics, incl. mean and standard error, using the summarise() and across() functions of {dplyer}
#calculate summary statistics using summarise() and across() and n/mean/min/median/max/sd/stderr # stderr <- function(x, na.rm=TRUE) { # if (na.rm) x <- na.omit(x) # sqrt(var(x)/length(x)) # } mtcars %>% group_by(cyl) %>% mutate( across(mpg, list( n = ~ n(), mean = ~ mean(.x, na.rm = TRUE), min = ~ min(.x, na.rm = TRUE), median = ~ median(.x, na.rm = TRUE), max = ~ max(.x, na.rm = TRUE), sd = ~ sd(.x, na.rm = TRUE), stderr = ~ stderr(.x)), .names = NULL)) %>% select(starts_with("mpg")) %>% summarise(mean = mean(mpg_mean), min = mean(mpg_min), median = mean(mpg_median), max = mean(mpg_max), sd = mean(mpg_sd), stderr = mean(mpg_stderr)) %>% #create column chart with error bars (using stderr) ggplot(aes(cyl, mean))+ geom_col(na.rm = TRUE)+ geom_errorbar(aes(ymin = mean-stderr, ymax = mean+stderr), position = "dodge", width = 0.25) #calculate summary statistics using summarise() and across() and n/mean/min/median/max/sd/stderr # stderr <- function(x, na.rm=TRUE) { # if (na.rm) x <- na.omit(x) # sqrt(var(x)/length(x)) # } mtcars %>% group_by(cyl) %>% mutate( across(mpg, list( n = ~ n(), mean = ~ mean(.x, na.rm = TRUE), min = ~ min(.x, na.rm = TRUE), median = ~ median(.x, na.rm = TRUE), max = ~ max(.x, na.rm = TRUE), sd = ~ sd(.x, na.rm = TRUE), stderr = ~ stderr(.x)), .names = NULL)) %>% select(starts_with("mpg")) %>% summarise(mean = mean(mpg_mean), min = mean(mpg_min), median = mean(mpg_median), max = mean(mpg_max), sd = mean(mpg_sd), stderr = mean(mpg_stderr)) %>% #create column chart with error bars (using stderr) ggplot(aes(cyl, mean))+ geom_col(na.rm = TRUE)+ geom_errorbar(aes(ymin = mean-stderr, ymax = mean+stderr), position = "dodge", width = 0.25)
Step 2: Create column charts with error bars (using 95% confidence intervals)
Then Seb Fox pointed me towards a method for calculating 95% confidence intervals using the {PHEindicatormethods} package, available on CRAN: https://cran.r-project.org/web/packages/PHEindicatormethods/index.html
#create MEAN column chart with error bars (using 95% confidence intervals) require(PHEindicatormethods) mtcars %>% filter(!is.na(cyl)) %>% group_by(cyl) %>% #use phe_mean() phe_mean(x = mpg, #field name from data containing the values to calculate the means for type = "full", #defines the data and metadata columns to be included in output; can be "value", "lower", "upper", "standard" (for all data) or "full" (for all data and metadata); quoted string; default = "full" confidence = 0.95) %>% #required level of confidence expressed as a number between 0.9 and 1 #create column chart with error bars (using 95% CI calculated using phe_mean()) ggplot(aes(cyl, value))+ geom_col(na.rm = TRUE)+ geom_errorbar(aes(ymin = lowercl, ymax = uppercl), position = "dodge", width = 0.25) #create PROPORTION column chart with error bars (using 95% confidence intervals) require(PHEindicatormethods) mtcars %>% group_by(cyl) %>% summarise(n = n(), sum = sum(n)) %>% mutate(sum = sum(n)) %>% #phe_proportion() phe_proportion(x = n, #numerator n = sum, #denominator type = "full", #defines the data and metadata columns to be included in output; can be "value", "lower", "upper", "standard" (for all data) or "full" (for all data and metadata); quoted string; default = "full" confidence = 0.95, #required level of confidence expressed as a number between 0.9 and 1 multiplier = 100) %>% #the multiplier used to express the final values (eg 100 = percentage); numeric; default 1 #create column chart with error bars (using 95% CI calculated using phe_proportion()) ggplot(aes(cyl, value))+ geom_col(na.rm = TRUE)+ geom_errorbar(aes(ymin = lowercl, ymax = uppercl), position = "dodge", width = 0.25)
I hope that the code, above, helps a few colleagues of mine across the NHS, in some small way.
Thank you, again, to all members of the NHS R Community, for all your help. Particular thanks go to everyone who has helped me, to date, on the NHS R Community’s Slack channel.
The post Using R to create column charts featuring 95% confidence intervals appeared first on NHS-R Community.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.