Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There are several graphics available for visualizing missing data including the VIM
package. However, I wanted a plot specifically for looking at the nature of missingness across variables and a clustering variable of interest to support data preparation in multilevel propensity score models (see the multilevelPSA
package). The following examples uses data from the Programme of International Student Assessment (PISA; see pisa
package).
The required packages can be downloaded from github. Note that the pisa
package is approximately 75mb.
> require(devtools) > install_github('multilevelPSA', 'jbryer') > install_github('pisa', 'jbryer')
The following will setup the data to be plotted. There is a pisa.setup.R
script included in the multilevelPSA
package that is included to assist with a demo there. Among many things, it creates a vector psa.cols
that defines the variables of interest in performing a propensity score analysis. These are the variables where missingness needs to be addressed.
> require(multilevelPSA) > require(pisa) > data(pisa.student) > pkgdir = system.file(package='multilevelPSA') > source(paste(pkgdir, '/pisa/pisa.setup.R', sep='')) > student = pisa.student[,psa.cols] > student$CNT = as.character(student$CNT)
And finally, to create the graphic use the plot.missing
command.
> plot.missing(student[,c(4:48)], student$CNT)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.