Simple visually-weighted regression plots

[This article was first published on is.R(), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There has recently been a lot of discussion of so-called “visually-weighted regression” plots.

Folk hero Hadley Wickham suggests that such plots would be easy to implement with ggplot2, and so I have attempted to prove him right.

The approach outlined in the following Gist would be easy to apply to any situation in which you have a matrix of replicated predictions or bootstrapped fits from a model — any such a matrix would just take the place of the simYhats object.

The end-product of this example

# A simple approach to visually-weighted regression plots
doInstall <- TRUE # Change to FALSE if you don't want packages installed.
toInstall <- c("ggplot2", "reshape2", "MASS")
if(doInstall){install.packages(toInstall, repos = "http://cran.us.r-project.org")}
lapply(toInstall, library, character.only = TRUE)
# Generate some data:
nn <- 1000
myData <- data.frame(X = rnorm(nn),
Binary = sample(c(0, 1), nn, replace = T))
myData$Y <- with(myData, X * 2 + Binary * 10 + rnorm(nn, sd = 5))
myData$Y <- (myData$Y > 1) * 1
head(myData)
# Model:
myModel <- glm(Y ~ X + Binary, data = myData, family = "binomial")
# Generate predictions
nSims <- 1000
someScenarios <- expand.grid(1, # Intercept
seq(min(myData$X), max(myData$X), len = 100), # A sequence covering the range of X values
c(0, 1)) # The minimum and maximum of a binary variable, or some other first difference
simDraws <- mvrnorm(nSims, coef(myModel), vcov(myModel))
simYhats <- simDraws %*% t(someScenarios)
simYhats <- plogis(simYhats)
dim(simYhats) # Simulated predictions
# Combine scenario definitions and predictions.
predictionFrame <- data.frame(someScenarios, t(simYhats))
# Reshape wide -> long:
longFrame <- melt(predictionFrame, id.vars = colnames(someScenarios))
head(longFrame)
zp1 <- ggplot(data = longFrame,
aes(x = Var2, y = value, group = paste(variable, Var3), colour = factor(Var3)))
zp1 <- zp1 + geom_line(alpha = I(1/sqrt(nSims)))
zp1 <- zp1 + scale_x_continuous("X-axis label", expand = c(0, 0))
zp1 <- zp1 + scale_y_continuous("Y-axis label", limits = c(0, 1), expand = c(0, 0))
zp1 <- zp1 + scale_colour_brewer(palette="Set1", labels = c("Low", "High")) # Change colour palette
zp1 <- zp1 + guides(colour = guide_legend("First-difference\nvariable",
override.aes = list(alpha = 1))) # Avoid an alpha-related legend problem
zp1 <- zp1 + ggtitle("Plot title")
zp1 <- zp1 + theme_bw()
print(zp1) # This might take a few seconds...

To leave a comment for the author, please follow the link and comment on their blog: is.R().

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)