Exploring the Stack Overflow Dev Survey with Shiny – part 1
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The challenge
Recently I saw that StackOverflow released their survey data and had been posted on Kaggle. The data came with the following context “Want to dive into the results yourself and see what you can learn about salaries or machine learning or diversity in tech?” and given that June is indeed shiny-appreciation month, I thought this would be a fun opportunity to combine an interesting public dataset, some exploratory analysis in shiny and uhhh.. my first blog.
Talking about this with a colleague at work, we both decided to independently have a crack and then compare our approaches. You can check out his blog here.
The data and idea
The data looked very rich, with almost 100k rows and 129 variables – a feast for any data addict. Feeling gluttonous, I wanted to take a bite out of every variable but without writing a tonne of code. So… iteration!
Exploration through iteration
Typically, the apps I have written in the past have consisted of a small number of well defined outputs in a fixed layout. Whilst this is appropriate for presenting a prescribed analysis, it is a bit cumbersome and slow for exploration. For this app, I wanted a different approach which had fast exploration, dynamic layout and hopefully powered by a small amount of code.
To do this, I would need a more programmatic way of creating UI elements and their corresponding outputs. The only non-reactive ui element that I needed is a simple selectInput
(for the user to select variables), all other UI elements and outputs would then react to this selection. To allow for reactive ui elements I needed to stick in a ui placeholder uiOutput
which would then be defined server-side with renderUI
ui <- fluidPage(
# User selection
selectInput("cols",
label = "Select columns to visualise",
choices = colnames(survey_data),
multiple = TRUE,
width = '800px'),
# UI placeholder
uiOutput("p")
)
Having finished the basic structure on the UI side, I made a start on a programmatic way of creating the ui outputs (in this case, using plotOutput
). The basic idea here is to create an output placeholder for each selection the user makes. Each of these outputs will have an associated id of the selected variables. To make it work, we require returning a list of these outputs inside the renderUI
expression – this can be achieved by using tagList
(shiny equivalent of a list) and assigning elements iteratively using a for
loop.
server <- function(input, output) {
# Define our UI placeholder using the user selection
output$p <- renderUI({
# Define an empty tagList
plots <- tagList()
# Iterate over input column selection
for (col in input$cols) {
plots[[col]] <- plotOutput(col)
}
plots
})
}
At this point, we have created a list of output placeholders but haven’t defined what these outputs will look like.
Before showing the code, I want to address some of the parts that need to come together to make this work. Firstly, we need our output to reactively render depending on the selection made by the user. In our case, we have a reactive input called input$cols
. Now for each element of this input, we want to render a plot so the looping needs to be outside of the render statement (unlike the ui creation that looped inside a render statement) but as we are looping over a reactive object, we require a reactive context – in this case an observe
statement fits perfectly.
server <- function(input, output) {
# Define our UI placeholder using the user selection
output$p <- renderUI({
# Define an empty tagList (shiny equivalent of a list)
plots <- tagList()
# Iterate over input column selection
for (col in input$cols) {
plots[[col]] <- plotOutput(col)
}
plots
})
# Define created outputs
observe({
# Iterate over input column selection
lapply(input$cols, function(col) {
# Render output for each associated output id
output[[col]] <- renderPlot({
hist(rnorm(10), main = col)
})
})
})
}
Back to the StackOverflow survey
In the previous sections, I introduced how to construct ui and server creation by way of iteration. Now I want to combine it with the StackOverflow survey to show how this kind of app can be effective for quick exploratory analysis.
There are many ways that we can explore a dataset but I have gone for pure simplicity – plotting a univariate distribution for each variable (histogram for continuous, barplot for categorical). The only other notable shiny addition is the use of validateto firstly have some control over which variables are plotted but also to give the user an informative message of why a plot isn’t shown.
You can test out the app (with a sample of the total data) here or the full code can be found below:
# Packages
library(shiny)
library(readr)
library(dplyr)
library(ggplot2)
# Data (change to appropriate file path)
survey_data <- read_csv("../data/survey_results_public.csv")
# Defining the UI part of the app
ui <- fluidPage(
column(
width = 6,
offset = 3,
fluidRow(
selectInput("cols",
label = "Select columns to visualise",
choices = colnames(survey_data),
multiple = TRUE,
width = '800px')
)
),
# Main output
column(
width = 8,
align = "center",
offset = 2,
uiOutput("p")
)
)
# Defining the server part of the app
server <- function(input, output) {
# Define our UI placeholder using the user selection
output$p <- renderUI({
# Define an empty tagList (shiny equivalent of a list)
plots <- tagList()
# Iterate over input column selection
for (col in input$cols) {
plots[[col]] <- plotOutput(col)
}
plots
})
# Trigger everytime the input columns are changed
observe({
lapply(input$cols, function(col) {
# Remove NA entries
d <- survey_data[!is.na(survey_data[[col]]), ]
# Column class
cc <- class(d[[col]])
output[[col]] <- renderPlot({
# Plots only defined for chracter, numeric & integer
validate(need(cc %in% c("character", "numeric", "integer"),
paste("Plots undefined for class", cc)))
if (cc == "character") {
# Only show barplot if we have fewer than 20 bars
validate(need(length(unique(d[[col]])) < 20,
paste(col, "contains more than 20 levels")))
# Basic ggplot barplot
ggplot(d, aes_string(x = col)) +
geom_bar() +
coord_flip() +
theme_bw() +
ggtitle(col)
} else {
# Basic ggplot histogram
ggplot(d, aes_string(x = col)) +
geom_histogram() +
theme_bw() +
ggtitle(col)
}
})
})
})
}
shinyApp(ui = ui, server = server)
Conclusion
For me, Shiny’s great strength is it’s versatility; analysis can come in many shapes and sizes so it is only fitting that Shiny, a framework for sharing analysis, can also be written in multitude of ways. In this blog post, the app wasn’t particularly snazzy (so probs not to be shared with the senior managers or as an ‘inspirational’ linkedin post) but it does what I want: quick exploration, dynamic layout & less than 100 lines of code!
My colleague Graham opted for a totally different app, one which presented the user with a well thought out and defined model. Again demonstrating the many use cases Shiny can have in a project. Not only can it be used at the start as an exploratory tool but also at the end to present findings, predictions and other project outcomes.
Asides from that, I had a lot of fun exploring the survey data, especially variables relating to health-lifestyle of developers – I’m no physician but, devs, try to do at least some exercise… Wishing you all a happy end to shiny-appreciation month, keep writing ’em apps, and see ya next time
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.