Site icon R-bloggers

Building Online Interactive Simulators for Predictive Models in R

[This article was first published on R – Displayr, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Correctly interpreting predictive models can be tricky. One solution to this problem is to create interactive simulators, where users can manipulate the predictor variables and see how the predictions change. This post describes a simple approach for creating online interactive simulators. It works for any model where there is a predict method. Better yet, if the model’s not top secret, you can build and share the model for no cost, using the free version of Displayr!

In this post I show how to describe the very simple simulator shown below. Click the image to interact with it, or click the button below to explore and edit the code.

Explore and edit this simulator

Step 1: Create the model

The first step is to create a model. There are lots of ways to do this, including:

In this post I will illustrate by using one of my all-time favorite models – a generalized additive model – via the gam function in the mgcv package. The process for creating this in Displayr is:


Step 2: Add controls for each of the predictors

Step 3: Computing the prediction

Press Insert > R Output (Analysis) and then enter the code below, modifying it as per your needs. For example, with the code SeniorCitizen = cSeniorCitizen, the variable name used in the model is SeniorCitizen and cSeniorCitizen is the name of the control.

The item names in the control must exactly match the values of the variables in the data set. It is for this reason that the MonthlyCharges code is a bit more complicated, as it needs to strip out the $ from the control and convert it into a number (as the variable in the data set just contains numbers).

   
predict(my.gam,
        type = "response",
        newdata = data.frame(SeniorCitizen = cSeniorCitizen, 
                    Tenure = as.numeric(cTenure),
                    InternetService = cInternetService,
                    MonthlyCharges = as.numeric(gsub("\\$", "", cMonthlyCharges))))[1] * 100

Confidence bands

Provided that the predict method supports them, the same approach easily extends to computing confidence intervals and other quantities from models. This code snippet computes the confidence intervals for the GAM used above.

 
pred <- predict(my.gam,
        se.fit = TRUE,
        newdata = data.frame(SeniorCitizen = cSeniorCitizen, 
                    Tenure = as.numeric(cTenure),
                    InternetService = cInternetService,
                    MonthlyCharges = as.numeric(gsub("\\$", "", cMonthlyCharges))))
bounds = plogis(pred$fit + c(-1.96, 0, 1.96) * pred$se.fit) * 100
names(bounds) = c("Lower 95% CI", "Predicted", "Upper 95% CI")
bounds

Computing predictions from coefficients

And, of course, you can also make predictions directly from coefficients, rather than from model objects. For example, the following code makes a prediction for a logistic regression:

 
coefs = my.logistic.regression$coef
XB = coefs["(Intercept)"] + 
        switch(cSeniorCitizen, 
               No = 0, 
               Yes = coefs["SeniorCitizenYes"]) +
        as.numeric(cTenure) * coefs["Tenure"] +
        switch(cInternetService, 
               No =  coefs["InternetServiceNo"], 
               "Fiber optic" = coefs["InternetServiceFiber optic"], 
               DSL = 0) +
        as.numeric(gsub("\\$", "", cMonthlyCharges)) * coefs["MonthlyCharges"]
100 / (1 + exp(-XB))

Making safe predictions

Sometimes models perform “unsafe” transformations of the data in their internals. For example, some machine learning models standardize inputs (subtract the mean and divide by standard deviation). This can create a problem at prediction time, as the predict method may, in the background, attempt to repeat the standardization using the data for the prediction. This will cause an error (as the standard deviation of a single input observation is 0). Similarly, it is possible to create unsafe predictions from even the most well-written model (e.g., if using poly or scale in your model formula). There are a variety of ways of dealing with unsafe predictions, but a safe course of action is to perform any transformations outside of the model (i.e., not in the model formula).

Step 4: Export the simulator

If everything has gone to plan you can now use the simulator. To export it so that others can use it, click Export > Public Web Page, and you can then share the link with whoever you wish. The version that I have created here is very simple, but you can do a lot more if you want to make something pretty or more detailed (see the Displayr Dashboard Showcase for more examples).

Explore and edit this simulator

Click here to interact with the published dashboard, or click here to open a copy of the Displayr document that I created when writing this post. It is completely live, so you can interact with it. Click on any of the objects on the page to view the underlying R code, which will appear in the Object Inspector > Properties > R CODE.

Ready to get started? Create your own simulator for free in Displayr!

To leave a comment for the author, please follow the link and comment on their blog: R – Displayr.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.