Evolution of a logistic regression
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my last post I showed how one can easily summarize the outcome of a logistic regression. Here I want to show how this really depends on the data-points that are used to estimate the model. Taking a cue from the evolution of a correlation I have plotted the estimated Odds Ratios (ORs) depending on the number of included participants. The result is bad news for those working with small (< 750 participants) data-sets.
“eval_reg” Function to estimate model parameters for subsets of data
eval_reg<-function(model){
mod<-model
dat<-mod$data[sample(nrow(mod$data)),]
vars<-names(coef(mod))
est<-data.frame(matrix(nrow=nrow(dat), ncol=length(vars)))
pb <- txtProgressBar(min = 50, max = nrow(dat), style = 3)for(i in 50:nrow(dat)){
try(boot_mod<-update(mod, data=dat[1:i,]))
try(est[i,]<-exp(coef(boot_mod)))
setTxtProgressBar(pb, i)
}
est$mod_nr<-1:length(dat[,1])
names(est)<-c(vars, ‘mod_nr’)
return(est)
}
As I randomized the order of data you can run it again and again to arrive at an even deeper mistrust as some of the resulting permutations will look like they stabilize earlier. On the balance you need to set the random-number seed to make it reproducible.
Run and plot the development
set.seed(29012001)
mod_eval<-eval_reg(gp_mod)
tmp<-melt(mod_eval,id=’mod_nr’)
tmp2<-tmp[tmp$variable!='(Intercept)',]ticks<-c(seq(.1, 1, by =.1), seq(0, 10, by =1), seq(10, 100, by =10))
ggplot(tmp2, aes(y=value, x = mod_nr, color = variable)) +
geom_line() +
geom_hline(y=1, linetype=2) +
labs(title = ‘Evolution of logistic regression’, y = ‘OR’, x = ‘number of participants’) +
scale_y_log10(breaks=ticks, labels = as.character(ticks)) +
theme_bw()
Update 29-01-2013:
I added my definition of the ticks on the log-scale. The packages needed are ggplot2 and reshape.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.