Statistics Sunday: Creating a Stacked Bar Chart for Rank Data
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
So how could I present these results? One idea I had was a stacked bar chart, and it took a bit of data wrangling to do it. That is, the rankings were all in separate variables, but I want them all on the same chart. Basically, I needed to create a dataset with:
- 1 variable to represent the factor being ranked
- 1 variable to represent the ranking given (1-5, or 6 that I called “Not Ranked”)
- 1 variable to represent the number of people giving that particular rank that particular factor
What I ultimately did was run frequencies for the factor variables, turn those frequency tables into data frames, and merged them together with rbind. I then created chart with ggplot. Here’s some code for a simplified example, which only uses 6 factors and asks people to rank the top 3.
First, let’s read in our sample dataset – note that these data were generated only for this example and are not real data:
library(tidyverse) ## -- Attaching packages --------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 -- ## v ggplot2 3.0.0 v purrr 0.2.4 ## v tibble 1.4.2 v dplyr 0.7.4 ## v tidyr 0.8.0 v stringr 1.3.1 ## v readr 1.1.1 v forcats 0.3.0 ## Warning: package 'ggplot2' was built under R version 3.5.1 ## -- Conflicts ------------------------------------------------------------------------------------------------------------------------ tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ranks <- read_csv("C:/Users/slocatelli/Desktop/sample_ranks.csv", col_names = TRUE) ## Parsed with column specification: ## cols( ## RespID = col_integer(), ## Salary = col_integer(), ## Recognition = col_integer(), ## PTO = col_integer(), ## Insurance = col_integer(), ## FlexibleHours = col_integer(), ## OptoLearn = col_integer() ## )
This dataset contains 7 variables - 1 respondent ID and 6 variables with ranks on factors considered important in a job: salary, recognition from employer, paid time off, insurance benefits, flexible scheduling, and opportunity to learn. I want to run frequencies for these variables, and turn those frequency tables into a data frame I can use in ggplot2. I'm sure there are much cleaner ways to do this (and please share in the comments!), but here's one not so pretty way:
salary <- as.data.frame(table(ranks$Salary)) salary$Name <- "Salary" recognition <- as.data.frame(table(ranks$Recognition)) recognition$Name <- "Recognition by \nEmployer" PTO <- as.data.frame(table(ranks$PTO)) PTO$Name <- "Paid Time Off" insurance <- as.data.frame(table(ranks$Insurance)) insurance$Name <- "Insurance" flexible <- as.data.frame(table(ranks$FlexibleHours)) flexible$Name <- "Flexible Schedule" learn <- as.data.frame(table(ranks$OptoLearn)) learn$Name <- "Opportunity to \nLearn" rank_chart <- rbind(salary, recognition, PTO, insurance, flexible, learn) rank_chart$Var1 <- as.numeric(rank_chart$Var1)
With my not-so-pretty data wrangling, the chart itself is actually pretty easy:
ggplot(rank_chart, aes(fill = Var1, y = Freq, x = Name)) + geom_bar(stat = "identity") + labs(title = "Ranking of Factors Most Important in a Job") + ylab("Frequency") + xlab("Job Factors") + scale_fill_continuous(name = "Ranking", breaks = c(1:4), labels = c("1","2","3","Not Ranked")) + theme_bw() + theme(plot.title=element_text(hjust=0.5))
Based on this chart, we can see the top factor is Salary. Insurance is slightly more important than paid time off, but these are definitely the top 2 and 3 factors. Recognition wasn't ranked by most people, but those who did considered it their #2 factor; ditto for flexible scheduling at #3. Opportunity to learn didn't make the top 3 for most respondents.
To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.