How to build a shiny “truck”
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This story was already published in RViews. It describes how it is possible to build a large scale shiny app without getting lost in code.
All code of this story can be checked out at: https://github.com/zappingseb/biowarptruck
Last month, at the R/Pharma conference that took place on the Harvard Campus, I presented bioWARP, a large Shiny application containing more than 500,000 lines of code. Although several other Shiny apps were presented at the conference, I noticed that none of them came close to being as big as bioWARP. And I asked myself, why?
I concluded that most people just don’t need to build them that big! So now, I would like to explain why we needed such a large app and how we went about building it.
To give you an idea of the scale I am talking about, an automotive metaphor might be useful. A typical Shiny app I see in my daily work has about 50 or even fewer interaction items. Let’s imagine this as a car. With less than 50 interactions, think of a small car like a Mini Cooper. Compared to these applications, with more than 500 interactions, bioWARP is a truck, maybe even a “monster” truck. So why do my customers want to drive trucks when everyone else is driving cars?
Why do we need a truck?
Building software often starts with checking the user requirements. So when we started the development of our statistical web application, we did that, too. Asking a lot of people inside our department, we noticed that the list of requirements was huge:
Main user requirements
- Pretty design that works universally
- Interactive elements
- Mathematical correctness of all results
Main application features
- Session logging
- Standardized PDF reports of all results
- Ability to restore sessions
- Harmonize it with other software applications
- Everything has to be tested
- Help pages
More requirements came then from all the analysis people perform on daily basis. They wanted to have some tasks integrated into our app:
Mathematical tasks
- Linear regression app
- Descriptive statistics app
- Homogeneity test app
- T-Test app
- Bootstrap simulation app
- Sensitivity/Specificity app
- Linearity app
- Clustering app
- BoxPlotting app
Additionally, it was required to write the whole application in R, as all our mathematical packages are written in R. So we decided to do it all with shiny because it already covers two of the three main user requirements: being pretty and being interactive.
Modularity + Standardization
Inside our department, we were running some large-scale desktop applications already. When it came to testing, we always noticed that testing takes forever. If one single piece of software gathers data, calculates statistics, provides plot outputs, and renders PDF reports, this is a huge truck and you can just test it by driving it a thousand miles and see if it still works. The idea we came up with was building our truck out of Lego bricks. Each Lego brick can be tested on its own. If a Lego wheel runs, the truck will run. The wheel holder part is universal, and if we change the size of the wheels, we can still run the truck, assuming each wheel was tested. This is called modularity. There exist different solutions in R and shiny that can be combined to make things modular:
- Shiny Modules
- Object orientation
- R-packages
- Clever name-spacing
As Shiny modules did not exist when we started, we chose options 2 and 3.
As an example, I’ll compare two simple Shiny apps representing two cars here. One is written using object orientation; one is a simple Shiny application. The image below illustrates that the renderPlot function in a standard Shiny app includes a plot, in this case using the hist function. So whenever you add a new plot, its function has to be called inside.
In the object-oriented app, the renderPlot function calls the shinyElement method of a generic plot object we created called AnyPlot. The first advantage is that the plot can easily be exchanged. (Please look into the code if you wonder if this really is so.) To describe that advantage, you can imagine a normal car, built of car parts. Our car is really a a Lego car, using even smaller standardizedparts (Lego bricks) to construct each part of the car. So instead of the grille made of one piece of steel, we constructed it of many little grey Lego bricks. Changing the grille for an update of the car does not require us to reconstruct the whole front; just use green bricks instead of grey bricks, for example, since they should have the same shape.
By going into the code of the two applications, you see there is a straight-forward disadvantage of object orientation: there is much more code. We have to define what a Lego brick is and what features it shall have.
library(methods) library(rlang) setGeneric("plotElement",where = parent.frame(),def = function(object){standardGeneric("plotElement")}) setGeneric("shinyElement",where = parent.frame(),def = function(object){standardGeneric("shinyElement")}) setClass("AnyPlot", representation(plot_element = "call")) setClass("HistPlot", representation(color="character",obs="numeric"), contains = "AnyPlot") AnyPlot <- function(plot_element=expr(plot(1,1))){ new("AnyPlot", plot_element = plot_element ) } HistPlot <- function(color="darkgrey",obs=100){ new("HistPlot", plot_element = expr(hist(rnorm(!!obs), col = !!color, border = 'white')), color = color, obs = obs ) } #' Method to plot a Plot element setMethod("plotElement",signature = "AnyPlot",definition = function(object){ eval(object@plot_element) }) #' Method to render a Plot Element setMethod("shinyElement",signature = "AnyPlot",definition = function(object){ renderPlot(plotElement(object)) }) server <- function(input, output, session) { # Create a reactive to create the Report object report_obj <- reactive(HistPlot(obs=input$obs)) # Check for change of the slider to change the plots observeEvent(input$obs,{ output$renderedPDF <- renderText("") output$renderPlot <- shinyElement( report_obj() ) } ) } # Simple shiny App containing the standard histogram + PDF render and Download button ui <- fluidPage( sidebarLayout( sidebarPanel( sliderInput( "obs", "Number of observations:", min = 10, max = 500, value = 100) ), mainPanel( plotOutput("renderPlot") ) ) ) shinyApp(ui = ui, server = server)
server <- function(input, output) { # Output Gray Histogram output$distPlot <- renderPlot({ hist(rnorm(input$obs), col = 'darkgray', border = 'white') }) } # Simple shiny App containing the standard histogram + PDF render and Download button ui <- fluidPage( sidebarLayout( sidebarPanel( sliderInput( "obs", "Number of observations:", min = 10, max = 500, value = 100) ), mainPanel( plotOutput("distPlot") ) ) ) shinyApp(ui = ui, server = server)
But an advantage of the object orientation is that you can now output the plot in a lot of different formats. We solved this by introducing methods called pdfElement, logElement, or archiveElement. To get a deeper look, you can check out some examples stored on GitHub here. These show differences between object-oriented and standard shiny apps. You can see that duplicated code is reduced in object-oriented apps, and while the code of the shiny app itself does not change for object-oriented apps, the code constructing the objects shown on the page changes. For the standard apps, the shiny code itself also changes every time an element is updated.
The main advantage of this approach is that you can keep your shiny app exactly the same whatever it calculates or whatever it reports. Inside our department, this meant that whenever somebody wants a different plot inside an app, we do not have to touch our main app again. Whenever somebody wanted to change just the linear regression app, we did not have to touch other apps. The look and feel, the logging, and the PDF report stays exactly the same. Those three features shall never be touched unless an update is needed to their own functionality.
Packaging
As you know, we did not build a singular app; we had to build many for the different mathematical analyses. So we decided that for each app we will construct a separate R package. This means we had to define one Class that defines what an app will look like in a corepackage. This can be seen as fitting into the Lego theme. So our app would be Lego City, where you have trucks and cars. Other apps may be more advanced and range inside Lego Technic.
Now each contributor to our shiny app builds a package that contains a child of our core class. We called this class Module, and we have a lot of Module packages. This is not a Shiny module, but it’s modular. Our app now allows the bringing together a lot of those modules and making it bigger and bigger and bigger. It get’s more HP and I wouldn’t call it a car anymore. Yeah, we have a truck! Made of Lego bricks!
The modularization and packaging now enables fast testing. Why? Each package can be tested using basic testthat functions. So first we tested our core application package, which allows adding building blocks. Afterwards, we tested each single package on its own. Finally, the whole application is tested. Our truck is ready to roll. Upon updates, we do not have to test the whole truck again. If we want to have larger tires, we just update the tire package, but not the core package or any other packages.
Config files
The truck is made of bricks — actuallyn the same bricks we used to build the car, just many more of them. Now the hard part is putting them all together and not losing track.
We are dealing with many the different Modules that we were writing, each of which comes in its own package. The main issue we had was that we wanted all apps to be deeply tested. During development, of course, not all apps were tested right away, so we had to give them a tag (tested yes/no). Additionally, some apps required help pages while others don’t. Some apps came with example data sets; some don’t. Some apps had a nice title in them already; for some it shall be easy to configure. For each Module, we’ll also have to source js and css files, which we allowed to be added for each app. The folder from which to source them shall be chosen by the app author. We wanted to provide as much flexibility as possible while keeping our standards for Lego bricks (look and feel, logging, plotting, and reporting). A simple example for such an app can be found on GitHub here.
We came up with the idea of config XML files, which contain all the information needed to tell what needs to be set for each Module. An example XML document is given below, which you can read as the LEGO manual. These small configurations allow managing the apps. We also build an XML document that allows the apps to use features of what we call core package. This XML file is rather difficult to set up, but imagine it tells which plot shall be logged, which input shall be used, and which plots shall go into the PDF report. It allows fast development while sticking to standards.
Inside the config file, you can clearly see the title of the app and the location of help pages, and that an example data sets is given. Even the name of the class that describes the Module is given. This allows us to rapidly add modules to our main app environment.
<module id="module1" type="default" datasets="yes" tested="no"> <package> modulepackage1 </package> <class> modulepackage1_Module</class> <title> Great BoxPlot Module </title> <short> GBM </short> <path source="modulepackage1"> . </path> <help> <level0>help/index.html</level0> <level1> <item name="details">help/about.html</item> </level1> </help> <data> <ds name="Two Groups" file="datasets/two_groups.csv"> </data> </module>
At the end, our truck is made of many parts that all increase its power and strength. As we now have around 16 modules in our real (in production) app, and each has between 20 and 50 inputs, the truck has 500 inputs. All of them look similar and can be used to produced standardized PDF reports. The truck can even become a monster truck, and thanks to the config files, it will still be easy to manage.
My message to shiny.car and shiny.truck developers
- Please do not start building a car until you know how many parts it will have at the end. Always consider it might become a truck. At first, always define your requirements.
- Use modularization! Use Shiny modules or inheritance provided by object orientation ( s4 or s6 ). Both keep you from changing a lot of code on minor changes in requirements.
- Use standardization! Try to have all your inputs and outputs as standardized as possible. If you use simple output bricks, it’s easy to output them in your preferred format. Features like logging, PDF reporting, or even testing will be much easier with standardized elements. Standardized inputs allow your users to be comfortable with new apps faster.
- Don’t build real trucks; build Lego trucks.
The ideas and opinions expressed in this post are those of the author alone, and are not to be construed as representing the opinions of his employer or anyone else.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.