Dashboards in R with Shiny and GoogleVis
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
a) Shiny has changed most of its functions and the previous one is outdated
b) In this case, the outputs are done with GoogleVis, which is a very interesting package itself, so it is worth dedicating some lines to it
c) I finally got a space in the hosting service in RStudio to upload my Dashboard, so you can see how it works live here. IMPORTANT: Ignore the error message you get at the beginning
Context:
EAHU is an acronym for “Encuesta Anual de Hogares Urbanos” (in English, Yearly Survey of Urban Homes), a survey done by the official statistics institute in Argentina whose microdata is available here. The Dashboard´s raw data consists of a processed version of the 2010 to 2012 results.
How it works:
Although the filters are almost self-explanatory, it is necessary to highlight that the first chart (in the Income Tab) is not affected by the year filter, as the plot itself “slices” the data temporarily.
Bugs:
During the set-up, I had several issues with the encoding (Spanish uses many multi-byte characters, such as á, é or ñ) and decided to replace them for ASCII characters. The problem that turned out is that Google did not recognise 3 of the 5 provinces that have a special character in its name (Neuquén, Río Negro and Tucumán. The first two are under “Patagonia” and the last one is under “NOA”) so you will never see them plotted in the map, although they will definitely appear in the rest of the charts.
Before deep diving in the code, I recommend you entering the panel
The code:
The code consists mainly of three files, the standards for any Dashboard written in Shiny, global.R, ui.R and server.R. You will see that global.R is however not mandatory to do a Dashboard and, as it will be explained afterwards is not precisely necessary in this case.
global.R
library(googleVis) library(reshape) library(shiny) library(reldist) setwd("~/ShinyApps/EAHU/") raw.data <- read.csv2("molten data_2.csv") raw.data <- subset(raw.data,value > 0 ) |
As it is perfectly explained here under “Global Objects”, this code could have been perfectly inserted in server.R outside ShinyServer() function as the ui.R file does not have any component that is dependent of an external source. In this case, global.R is indeed not necessary because the source file is completely static and consequently there is no need to reference the UI.R file to an external source, which should be loaded prior to the display of the Dashboard. However, global.R could be extremely useful when working, for instance, with data coming from the Internet which might change more often and where the UI.R file should “adapt” itself to a source file, especially when this source file is also used by server.R. In this case, as the raw data is completely static, the UI.R file filters are completely “hard coded”. On the contrary, if your data comes from a SQL Query, for example, where your filter variables change frequently, the best option would be to do all the possible workout of the data in global.R so that ui.R and server.R can use it afterwards.
UI.R
library(shiny) shinyUI(pageWithSidebar( # Application title headerPanel("EAHU Panel"), sidebarPanel( numericInput("minedad", "Minimum Age",-1,-1,99), numericInput("maxedad", "Maximum Age",99,-1,99), checkboxGroupInput("gender","Gender",c( "Male"="Varon", "Female"="Mujer")), checkboxGroupInput("region","Region",c( "NOA"="NOA", "NEA"="NEA", "Cuyo"="Cuyo", "CABA"="CABA", "Buenos Aires"="BA", "Centro"="Centro", "Patagonia"="Patagonia")), checkboxGroupInput("year","Year",c( "2010"="2010", "2011" ="2011", "2012" = "2012")), submitButton(text="Ready")), mainPanel( tabsetPanel( tabPanel("Income",h4("Yearly income"),htmlOutput("motionchart")), tabPanel("Demography",h4("Ages by provinces"),htmlOutput("edades")), tabPanel("Occupation",h4("Occupation by province (in %)"),htmlOutput("tabla1"), h4("Activity of the occupied people (in %)"),htmlOutput("tabla2")), tabPanel("Education",htmlOutput("linech")) ) ) )) |
The UI.R file can always be “split” into two main parts. Where you give the input (filters) and where you see the output(charts, tables, etc).
This is the “input” part
library(shiny) shinyUI(pageWithSidebar( # Application title headerPanel("EAHU Panel"), sidebarPanel( numericInput("minedad", "Minimum Age",-1,-1,99), numericInput("maxedad", "Maximum Age",99,-1,99), checkboxGroupInput("gender","Gender",c( "Male"="Varon", "Female"="Mujer")), checkboxGroupInput("region","Region",c( "NOA"="NOA", "NEA"="NEA", "Cuyo"="Cuyo", "CABA"="CABA", "Buenos Aires"="BA", "Centro"="Centro", "Patagonia"="Patagonia")), checkboxGroupInput("year","Year",c( "2010"="2010", "2011" ="2011", "2012" = "2012")), submitButton(text="Ready")), |
and this one the output.
mainPanel( tabsetPanel( tabPanel("Income",h4("Yearly income"),htmlOutput("motionchart")), tabPanel("Demography",h4("Ages by provinces"),htmlOutput("edades")), tabPanel("Occupation",h4("Occupation by province (in %)"),htmlOutput("tabla1"), h4("Activity of the occupied people (in %)"),htmlOutput("tabla2")), tabPanel("Education",htmlOutput("linech")) ) ) )) |
As it was explained in this post, shiny works with an input – output logic based on two files that, unlike global.R, are mandatory for the whole Dashobard to work as expected. In this sense, it is essential to keep a clear idea of what a Dashobard does and how the information flows across the files to fulfill its purpose. Basically, it could be separated in three steps:
1) Some data is “given”(i.e. filters are selected, terms/numbers are entered, files are uploaded, etc etc)
2) The data is processed according to the given input data in 1) and results are “given”(outputs)
3) The results are shown
In Shiny, this could be divided in:
1) UI.R(“input” part)
2) server.R
3) UI.R(“output” part)
Parts 1 and 3 were explained before, and they are clearly easier than server.R where the most important work takes place. Firstly, an overview of the whole code.
server.R
library(shiny) shinyServer(function (input, output) { data <- reactive({ a <- subset(raw.data, Region %in% input$region & ch06 >= input$minedad & ch06 <= input$maxedad & ch04 %in% input$gender & Anio %in% input$year) a <- droplevels(a) return(a) }) data.mc <- reactive({ b <- subset(raw.data, Region %in% input$region & ch06 >= input$minedad & ch06 <= input$maxedad & ch04 %in% input$gender) b <- droplevels(b) return(b) }) output$motionchart <- renderGvis({ cast.1 <- as.data.frame(cast(data.mc(),jurisdiccion + Anio ~ variable, c(mean,sd))) cast.1 <- cbind(cast.1,CV=cast.1[,4]/cast.1[,3]) ginis <- numeric() for(i in 1:nrow(cast.1)){ ss <- subset(data.mc(),data.mc()[,1] == cast.1[i,1] & Anio == cast.1[i,2],select=value) ginis <- append(ginis,gini(ss[,1])) } cast.1 <- cbind(cast.1,ginis) return (gvisMotionChart(cast.1,"jurisdiccion",timevar="Anio",xvar="ipcf_mean",yvar="ginis",date.format="%Y")) }) output$edades <- renderGvis({ cast.map <- aggregate(data()[,4],list(Jurisdiccion=data()$jurisdiccion),mean) names(cast.map)[2] <- "promedio" cast.map$promedio <- round(cast.map$promedio,3) map <- gvisGeoChart(cast.map,colorvar="promedio",locationvar="Jurisdiccion", options=list(region="AR",displayMode="regions", resolution="provinces", title="Average age per Province", titlePosition='out', titleTextStyle="{color:'black',fontName:'Courier',fontSize:14}", height=500, width=400)) quant<-cut(data()$ch06,quantile(data()$ch06,(0:4)/4),include.lowest=TRUE,na.rm=TRUE) sset <- cbind(data(),quant) tab <- with(sset,prop.table(table(jurisdiccion,quant),1)) tab <- as.data.frame.matrix(tab) tab <- tab*100 tab <- round(tab,2) tab <- cbind(row.names(tab),tab) colnames(tab)[1] <- "Provincia" bar.pl <- gvisColumnChart(tab,xvar=colnames(tab)[1],yvar=colnames(tab)[2:5], options=list(title="Age cuartiles per province (in %)", titlePosition='out', hAxis="{slantedText:'true',slantedTextAngle:45}", titleTextStyle="{color:'black',fontName:'Courier',fontSize:14}", height=500, width=400)) merge <- gvisMerge(map,bar.pl,horizontal=TRUE,tableOptions="cellspacing=10") return(merge) }) output$tabla1 <- renderGvis({ t1 <- with(data(), prop.table(table(jurisdiccion,ocupacion),1)) t1 <- as.data.frame.matrix(t1) t1 <- t1*100 t1 <- round(t1,1) t1 <- cbind(row.names(t1),t1) colnames(t1)[1] <- "Provincia" t1.pl <- gvisTable(t1,options=list(page='enable',height=300,width=800)) return(t1.pl) }) output$tabla2 <- renderGvis({ t2 <- with(data(), prop.table(table(jurisdiccion,caes_recode2),1)) t2 <- as.data.frame.matrix(t2) t2 <- t2*100 t2 <- round(t2,1) t2 <- cbind(row.names(t2),t2) colnames(t2) <- tolower(colnames(t2)) colnames(t2)[1] <- "Provincia" t2.pl <- gvisTable(t2,options=list(page='enable',height=300,width=800)) return(t2.pl) }) output$linech <- renderGvis({ tab <- with(data(),prop.table(table(jurisdiccion,nivel_ed),1)) tab <- as.data.frame.matrix(tab) tab <- tab*100 tab <- round(tab,2) tab <- cbind(row.names(tab),tab) colnames(tab)[1] <- "Provincia" tab <- tab[,-grep("Sin instruccion",colnames(tab))] order <- c("Provincia","Primaria Incompleta","Primaria Completa","Secundaria Incompleta","Secundaria Completa", "Universitaria Incompleta", "Universitaria Completa") index <- lapply(order, function(x) grep(x,names(tab))) tab <- tab[,c(as.numeric(index))] tab.s <- cbind(tab[,1],sum(tab[,2:7]),sum(tab[,3:7]),sum(tab[,4:7]) ,sum(tab[,5:7]),sum(tab[,6:7]),sum(tab[,7])) tab.2 <- data.frame(rep(0,nrow(tab)),rep(0,nrow(tab)),rep(0,nrow(tab)) ,rep(0,nrow(tab)),rep(0,nrow(tab)),rep(0,nrow(tab))) names(tab.2) <- names(tab)[2:7] for ( i in 1:6){ for (j in 1:6){ if(i >= j){ tab.2[,j] <- tab.2[,j]+tab[,i+1] } } } tab.pl <- as.data.frame(t(tab.2)) colnames(tab.pl) <- tab$Provincia tab.pl <- cbind(Nivel_ed=names(tab)[2:7],tab.pl) #### Area under curve #### areas <- rbind(rep(100,ncol(tab.pl)),tab.pl) areas[,2:ncol(areas)] <- areas[,2:ncol(areas)]/100 ed.coef <- (areas[1,-c(1)]+areas[2,-c(1)])/2 limit <- nrow(areas)-1 for (i in 2:limit){ j <- i +1 ed.coef <- ed.coef + (areas[i,-c(1)]+areas[j,-c(1)])/2 } ed.coef <- ed.coef/limit ed.coef <- t(ed.coef) ed.coef <- round(ed.coef,4) ed.coef <- cbind(colnames(areas)[-c(1)],ed.coef) ed.coef <- as.data.frame(ed.coef) colnames(ed.coef)<-c("Provincia","Education Completeness") #### Plots ### line.pl <- gvisLineChart(tab.pl,xvar=colnames(tab.pl)[1],yvar=colnames(tab.pl)[-1], options=list( title="Education Curve", titlePosition='out', hAxis="{slantedText:'true',slantedTextAngle:45}", titleTextStyle="{color:'black',fontName:'Courier'}", legend="{color:'black',fontName:'Courier'}", fontSize="10", chartArea="{left:40,top:30,width:'70%',height:'75%'}", height=550, width=600)) t1.ed <- gvisTable(ed.coef, options=list(page='enable',fontSize="10",height=300,width=275)) ed.output <- gvisMerge(line.pl,t1.ed,horizontal=TRUE) return(ed.output) }) outputOptions(output, "motionchart", suspendWhenHidden = FALSE) outputOptions(output, "edades", suspendWhenHidden = FALSE) outputOptions(output, "tabla1", suspendWhenHidden = FALSE) outputOptions(output, "tabla2", suspendWhenHidden = FALSE) outputOptions(output, "linech", suspendWhenHidden = FALSE) }) |
180 lines of code at once is surely too much information to grasp the idea underlying this dashboard and its implementation. Below, you will find a step by step explanation of each part.
1) Data creation:
data <- reactive({ a <- subset(raw.data, Region %in% input$region & ch06 >= input$minedad & ch06 <= input$maxedad & ch04 %in% input$gender & Anio %in% input$year) a <- droplevels(a) return(a) }) data.mc <- reactive({ b <- subset(raw.data, Region %in% input$region & ch06 >= input$minedad & ch06 <= input$maxedad & ch04 %in% input$gender) b <- droplevels(b) return(b) }) |
Two different datasets are created; the first one is used by all the plots except the motionchart, which uses the second one. The main difference is that the motionchart does not take into account the year filter, as it is not needed because the graph itself can “slice” the data per year. In my opinion, although it might not be neat to use two different subsets for the plots displayed in the same dashboard, I think that in the case of the motion chart, the year filter does not really play any role, because the output itself enables the user to perform the same action.
2) Motionchart:
output$motionchart <- renderGvis({ cast.1 <- as.data.frame(cast(data.mc(),jurisdiccion + Anio ~ variable, c(mean,sd))) cast.1 <- cbind(cast.1,CV=cast.1[,4]/cast.1[,3]) ginis <- numeric() for(i in 1:nrow(cast.1)){ ss <- subset(data.mc(),data.mc()[,1] == cast.1[i,1] & Anio == cast.1[i,2],select=value) ginis <- append(ginis,gini(ss[,1])) } cast.1 <- cbind(cast.1,ginis) return (gvisMotionChart(cast.1,"jurisdiccion",timevar="Anio",xvar="ipcf_mean",yvar="ginis",date.format="%Y")) }) |
The first of the plots created with the googleVis package. When I first got in touch with it, it surprised me how it managed to plot 6 variables (i.e. the vertical and horizontal axis, the time, the name of the elements, the size and the colour of the balls) in a very clear way. Moreover, it has a menu where you can change to timeseries or barplots, so I would say it is 3 plots at the same time.
For this output, you can see that the data is processed using the cast() function, which belongs to the reshape package. This is because the data uploaded was previously “molten”. Equivalent results could have been obtained withthe use of aggregate(), which is in fact implemented in other outputs. This function is used to obtain a data frame with the mean income and its standard deviation.
Then, the coefficient of variation (standard variation/mean). Probably, these 3 measures could have been obtained with any visualization specific software, such as Tableau or Qlikview. However, is it possible to calculate GINI dynamically? If it is, it probably requires programming the whole integral calculation. In R, however, you can do it with just 4 lines using the reldist package
Finally, a small data frame with all the info is built and is plotted witgh the gvisMotionChart() function. (see the documentation for more details).
Please notice that you cannot call the reactive functions that filter the data, store them in an object and use them accross the functions because you will get an error. This is because the functions generating the plots are private, even if they are all contained in Shinyserver()
DON´T
library(shiny) shinyServer(function (input, output) { data <- reactive({ a <- subset(raw.data, Region %in% input$region & ch06 >= input$minedad & ch06 <= input$maxedad & ch04 %in% input$gender & Anio %in% input$year) a <- droplevels(a) return(a) }) subset1 <- data() output$motionchart <- renderPlot({ table(subset1) }) |
3) Map creation
output$edades <- renderGvis({ cast.map <- aggregate(data()[,4],list(Jurisdiccion=data()$jurisdiccion),mean) names(cast.map)[2] <- "promedio" cast.map$promedio <- round(cast.map$promedio,3) map <- gvisGeoChart(cast.map,colorvar="promedio",locationvar="Jurisdiccion", options=list(region="AR",displayMode="regions", resolution="provinces", height=500, width=400)) |
This map displays the average age of each province given the selected filters. The gvisGeoChart() enables to work with names that refer to geographical items (i.e, states, countries, continents, etc.) or with coordinates. In this case, it was considerably easier to work with names. The main drawback of it, due to the encoding issue mentioned, was that some of the provinces were not identified and were not included in the plot. In addition, I could not manage to add a title to it. It seems as if the geoplot does not enable titles.
4) Quartiles barplot
quant<-cut(data()$ch06,quantile(data()$ch06,(0:4)/4),include.lowest=TRUE,na.rm=TRUE) sset <- cbind(data(),quant) tab <- with(sset,prop.table(table(jurisdiccion,quant),1)) tab <- as.data.frame.matrix(tab) tab <- tab*100 tab <- round(tab,2) tab <- cbind(row.names(tab),tab) colnames(tab)[1] <- "Provincia" bar.pl <- gvisColumnChart(tab,xvar=colnames(tab)[1],yvar=colnames(tab)[2:5], options=list(title="Age cuartiles per province (in %)", titlePosition='out', hAxis="{slantedText:'true',slantedTextAngle:45}", titleTextStyle="{color:'black',fontName:'Courier',fontSize:14}", height=500, width=400)) |
The main advantage of this barplot is that the quartiles are built dynamically based upon filters applied. The options documentation for the gvis plots is not very good in R itself. In my opinion, the best is to go to the google documentation. There you will be able to see what is editable and what is not. Please note that you will have to adapt what is written in the documentation to the R language.
5) Tables
output$tabla1 <- renderGvis({ t1 <- with(data(), prop.table(table(jurisdiccion,ocupacion),1)) t1 <- as.data.frame.matrix(t1) t1 <- t1*100 t1 <- round(t1,1) t1 <- cbind(row.names(t1),t1) colnames(t1)[1] <- "Provincia" t1.pl <- gvisTable(t1,options=list(page='enable',height=300,width=800)) return(t1.pl) }) output$tabla2 <- renderGvis({ t2 <- with(data(), prop.table(table(jurisdiccion,caes_recode2),1)) t2 <- as.data.frame.matrix(t2) t2 <- t2*100 t2 <- round(t2,1) t2 <- cbind(row.names(t2),t2) colnames(t2) <- tolower(colnames(t2)) colnames(t2)[1] <- "Provincia" t2.pl <- gvisTable(t2,options=list(page='enable',height=300,width=800)) return(t2.pl) }) |
This is the code to generate tables with the googlevis package. For this kind of visualizations, it might be more helpful than a normal R table because the rows can be sorted by any of the columns (ascendingly or descendingly) by just clicking on the column.
6) Education Curve
output$linech <- renderGvis({ tab <- with(data(),prop.table(table(jurisdiccion,nivel_ed),1)) tab <- as.data.frame.matrix(tab) tab <- tab*100 tab <- round(tab,2) tab <- cbind(row.names(tab),tab) colnames(tab)[1] <- "Provincia" tab <- tab[,-grep("Sin instruccion",colnames(tab))] order <- c("Provincia","Primaria Incompleta","Primaria Completa","Secundaria Incompleta","Secundaria Completa", "Universitaria Incompleta", "Universitaria Completa") index <- lapply(order, function(x) grep(x,names(tab))) tab <- tab[,c(as.numeric(index))] tab.s <- cbind(tab[,1],sum(tab[,2:7]),sum(tab[,3:7]),sum(tab[,4:7]) ,sum(tab[,5:7]),sum(tab[,6:7]),sum(tab[,7])) tab.2 <- data.frame(rep(0,nrow(tab)),rep(0,nrow(tab)),rep(0,nrow(tab)) ,rep(0,nrow(tab)),rep(0,nrow(tab)),rep(0,nrow(tab))) names(tab.2) <- names(tab)[2:7] for ( i in 1:6){ for (j in 1:6){ if(i >= j){ tab.2[,j] <- tab.2[,j]+tab[,i+1] } } } tab.pl <- as.data.frame(t(tab.2)) colnames(tab.pl) <- tab$Provincia tab.pl <- cbind(Nivel_ed=names(tab)[2:7],tab.pl) #### Area under curve #### areas <- rbind(rep(100,ncol(tab.pl)),tab.pl) areas[,2:ncol(areas)] <- areas[,2:ncol(areas)]/100 ed.coef <- (areas[1,-c(1)]+areas[2,-c(1)])/2 limit <- nrow(areas)-1 for (i in 2:limit){ j <- i +1 ed.coef <- ed.coef + (areas[i,-c(1)]+areas[j,-c(1)])/2 } ed.coef <- ed.coef/limit ed.coef <- t(ed.coef) ed.coef <- round(ed.coef,4) ed.coef <- cbind(colnames(areas)[-c(1)],ed.coef) ed.coef <- as.data.frame(ed.coef) colnames(ed.coef)<-c("Provincia","Education Completeness") #### Plots ### line.pl <- gvisLineChart(tab.pl,xvar=colnames(tab.pl)[1],yvar=colnames(tab.pl)[-1], options=list( title="Education Curve", titlePosition='out', hAxis="{slantedText:'true',slantedTextAngle:45}", titleTextStyle="{color:'black',fontName:'Courier'}", legend="{color:'black',fontName:'Courier'}", fontSize="10", chartArea="{left:40,top:30,width:'70%',height:'75%'}", height=550, width=600)) t1.ed <- gvisTable(ed.coef, options=list(page='enable',fontSize="10",height=300,width=275)) ed.output <- gvisMerge(line.pl,t1.ed,horizontal=TRUE) return(ed.output) }) |
Finally, the education curve is based upon the concept that the maximum education level achieved implies having completed prior stages. For instance, someone, whose education level is “Complete High School” means that he/she has achieved primary incomplete, primary complete, high school incomplete and high school complete.
Based upon this principle, you can pot a descending curve from the lowest to the highest level of education and compare graphically the drops in each province.
In addition, you can build a measure that calculates the area under those curves, which can support a graphical hypothesis with an empiric “hard” measure.
As the calculations are done with the subsetted data, the plot is einterly dynamic. This can enable you to see how the curves/measures change over time or age of the population.
Conclusion
The last tab is a perfect example of the power of R. Far from being a mere statistics software, R is a tool in constant expansion, that can perfectly replace and even beat the most used commercial specialized softwares. In this case, for instance, it has been proved that R can perform BI tasks with much more flexibility and potential than most of the softwares dedicated to that. In this context, it is just a question of time to overcome old and expensive softwares in the “data world” and replace them definitely by free and open source tools like R, among many others.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.