Of Needles and Haystacks: Building an Accurate Statewide Dropout Early Warning System in Wisconsin
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For the past two years I have been working on the Wisconsin Dropout Early Warning System, a predictive model of on time high school graduation for students in grades 6-9 in Wisconsin. The goal of this project is to help schools and educators have an early indication of the likely graduation of each of their students, early enough to allow time for individualized intervention. The result is that nearly 225,000 students receive an individualized prediction at the start and end of the school year.
The system is moving into its second year of use this fall and I recently completed a research paper describing the predictive analytic approach taken within DEWS. The research paper is intended to serve as a description and guide of the decisions made in developing an automated prediction system using administrative data. The paper covers both the data preparation and model building process as well as a review of the results. A preview is shown below which demonstrates how the EWS models trained in Wisconsin compare to the accuracy reported in the research literature – represented by the points on the graph. The accuracy is measured using the ROC curve. The article is now available via figshare.
The colored lines represent different types of ensembled statistical models and their accuracy across various thresholds of their predicted probabilities. The points represent the accuracy of comparable models in the research literature using reported accuracy from a paper by Alex Bowers:
Bowers, A.J., Sprott, R.*, Taff, S.A.* (2013) Do we Know Who Will Drop Out? A Review of the Predictors of Dropping out of High School: Precision, Sensitivity and Specificity. The
High School Journal, 96(2), 77-100. doi:10.1353/hsj.2013.0000. This article serves as good background and grounds the benchmarking of the models built in Wisconsin and for others when benchmarking their own models.
Article Abstract:
The state of Wisconsin has one of the highest four year graduation rates in the nation, but deep disparities among student subgroups remain. To address this the state has created the Wisconsin Dropout Early Warning System (DEWS), a predictive model of student dropout risk for students in grades six through nine. The Wisconsin DEWS is in use statewide and currently provides predictions on the likelihood of graduation for over 225,000 students. DEWS represents a novel statistical learning based approach to the challenge of assessing the risk of non-graduation for students and provides highly accurate predictions for students in the middle grades without expanding beyond mandated administrative data collections.
Similar dropout early warning systems are in place in many jurisdictions across the country. Prior research has shown that in many cases the indicators used by such systems do a poor job of balancing the trade off between correct classification of likely dropouts and false-alarm (Bowers et al., 2013). Building on this work, DEWS uses the receiver-operating characteristic (ROC) metric to identify the best possible set of statistical models for making predictions about individual students.
This paper describes the DEWS approach and the software behind it, which leverages the open source statistical language R (R Core Team, 2013). As a result DEWS is a flexible series of software modules that can adapt to new data, new algorithms, and new outcome variables to not only predict dropout, but also impute key predictors as well. The design and implementation of each of these modules is described in detail as well as the open-source R package, EWStools, that serves as the core of DEWS (Knowles, 2014).
Code:
The code that powers the EWS is an open source R extension of the caret package which is available on GitHub: EWStools on GitHub
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.