Site icon R-bloggers

Visualizing Missing Data

[This article was first published on Jason.Bryer.org Blog - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are several graphics available for visualizing missing data including the VIM package. However, I wanted a plot specifically for looking at the nature of missingness across variables and a clustering variable of interest to support data preparation in multilevel propensity score models (see the multilevelPSA package). The following examples uses data from the Programme of International Student Assessment (PISA; see pisa package).

The required packages can be downloaded from github. Note that the pisa package is approximately 75mb.

> require(devtools)
> install_github('multilevelPSA', 'jbryer')
> install_github('pisa', 'jbryer')

The following will setup the data to be plotted. There is a pisa.setup.R script included in the multilevelPSA package that is included to assist with a demo there. Among many things, it creates a vector psa.cols that defines the variables of interest in performing a propensity score analysis. These are the variables where missingness needs to be addressed.

> require(multilevelPSA)
> require(pisa)
> data(pisa.student)
> pkgdir = system.file(package='multilevelPSA')
> source(paste(pkgdir, '/pisa/pisa.setup.R', sep=''))
> student = pisa.student[,psa.cols]
> student$CNT = as.character(student$CNT)

And finally, to create the graphic use the plot.missing command.

> plot.missing(student[,c(4:48)], student$CNT)

To leave a comment for the author, please follow the link and comment on their blog: Jason.Bryer.org Blog - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.