Simple visually-weighted regression plots
[This article was first published on is.R(), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There has recently been a lot of discussion of so-called “visually-weighted regression” plots.
Folk hero Hadley Wickham suggests that such plots would be easy to implement with ggplot2, and so I have attempted to prove him right.
The approach outlined in the following Gist would be easy to apply to any situation in which you have a matrix of replicated predictions or bootstrapped fits from a model — any such a matrix would just take the place of the simYhats object.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# A simple approach to visually-weighted regression plots | |
doInstall <- TRUE # Change to FALSE if you don't want packages installed. | |
toInstall <- c("ggplot2", "reshape2", "MASS") | |
if(doInstall){install.packages(toInstall, repos = "http://cran.us.r-project.org")} | |
lapply(toInstall, library, character.only = TRUE) | |
# Generate some data: | |
nn <- 1000 | |
myData <- data.frame(X = rnorm(nn), | |
Binary = sample(c(0, 1), nn, replace = T)) | |
myData$Y <- with(myData, X * 2 + Binary * 10 + rnorm(nn, sd = 5)) | |
myData$Y <- (myData$Y > 1) * 1 | |
head(myData) | |
# Model: | |
myModel <- glm(Y ~ X + Binary, data = myData, family = "binomial") | |
# Generate predictions | |
nSims <- 1000 | |
someScenarios <- expand.grid(1, # Intercept | |
seq(min(myData$X), max(myData$X), len = 100), # A sequence covering the range of X values | |
c(0, 1)) # The minimum and maximum of a binary variable, or some other first difference | |
simDraws <- mvrnorm(nSims, coef(myModel), vcov(myModel)) | |
simYhats <- simDraws %*% t(someScenarios) | |
simYhats <- plogis(simYhats) | |
dim(simYhats) # Simulated predictions | |
# Combine scenario definitions and predictions. | |
predictionFrame <- data.frame(someScenarios, t(simYhats)) | |
# Reshape wide -> long: | |
longFrame <- melt(predictionFrame, id.vars = colnames(someScenarios)) | |
head(longFrame) | |
zp1 <- ggplot(data = longFrame, | |
aes(x = Var2, y = value, group = paste(variable, Var3), colour = factor(Var3))) | |
zp1 <- zp1 + geom_line(alpha = I(1/sqrt(nSims))) | |
zp1 <- zp1 + scale_x_continuous("X-axis label", expand = c(0, 0)) | |
zp1 <- zp1 + scale_y_continuous("Y-axis label", limits = c(0, 1), expand = c(0, 0)) | |
zp1 <- zp1 + scale_colour_brewer(palette="Set1", labels = c("Low", "High")) # Change colour palette | |
zp1 <- zp1 + guides(colour = guide_legend("First-difference\nvariable", | |
override.aes = list(alpha = 1))) # Avoid an alpha-related legend problem | |
zp1 <- zp1 + ggtitle("Plot title") | |
zp1 <- zp1 + theme_bw() | |
print(zp1) # This might take a few seconds... |
To leave a comment for the author, please follow the link and comment on their blog: is.R().
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.