California High School Graduation and Dropout Rates

a modeler's tribulations

10 years ago

[This article was first published on a modeler's tribulations, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Abstract

The California Deparment of Education recently (June 2012) had a news release on the increase in high school (grades 9-12) graduation rates and decrease in dropout rates. The data used by the Department was from two cohorts (4-year periods) on a statewide basis. While the statewide numbers show an increase (decrease) in graduation (dropout) rates, a particular school district’s numbers do not necessarily follow the statewide trend. Moreover, the longer-term data (about 20 years) on graduation and dropout rates has been ignored by the Department in their analyses. Following is my attempt to dig deeper into the data – just for kicks!

Overview

The California Deparment of Education recently (June 2012, news article) publicized the increase in high school graduation rates and decrease in dropout rates. Newspapers have also reported on this topic:

However, the data is available for only two cohorts (‘09 and ‘10). Moreover, the available cohort data is at an aggregated scale – school district or state, and not for each individual school.

I am interested in individual school performance – particulalry the five high schools in the Fremont Unified district, since I currently live here. The following is my attempt to dig deeper into this data, of course, using R. Below are the data sources. I downloaded all the files (2 on cohort rates, 1 on school list and 20 on dropout rates) before running the following R code.

Reproduce Dept of Education’s news release data

First, I try to reproduce the results from the news release.

rm(list = ls())

workDir <- "C:/Rstuff/DATA/CA Schools/"
setwd(workDir)

# read data
coh09 <- read.csv("cohort09.txt", header = TRUE, sep = "\t", as.is = TRUE)
coh10 <- read.csv("cohort10.txt", header = TRUE, sep = "\t", as.is = TRUE)

News release numbers are based on “Statewide” data. Extract only this subset and produce a table similar to the one in the news release.

subsetName <- "Statewide"
# cohort09
data09 <- subset(coh09, Name == subsetName & (Subgroup == "All" | 
    Subgrouptype == "All"))
# cohort10
data10 <- subset(coh10, Name == subsetName & (Subgroup == "All" | 
    Subgrouptype == "All"))

Reproduce table of Graduation Rates

grad09 <- data09[, c("Subgroup", "Subgrouptype", "NumGraduates", 
    "Cohort.Graduation.Rate")]
colnames(grad09) <- c("Subgroup", "Subgrouptype", "Num0910", "Per0910")
grad10 <- data10[, c("Subgroup", "Subgrouptype", "NumGraduates", 
    "Cohort.Graduation.Rate")]
colnames(grad10) <- c("Subgroup", "Subgrouptype", "Num1011", "Per1011")

# create a table similar to the news release
gradTable <- merge(grad09, grad10)
gradTable$Type <- c(gradTable$Subgrouptype[1:9], gradTable$Subgroup[10:16])
gradTable <- gradTable[, c("Type", "Per0910", "Num0910", "Per1011", 
    "Num1011")]
row.names(gradTable) <- gradTable$Type
gradTable <- gradTable[c("All", "5", "1", "2", "3", "4", "6", "7", 
    "9", "0", "EL", "MIG", "SE", "SD"), ]  #re-order to match news release order
gradTable$Change <- as.numeric(gradTable$Per1011 - gradTable$Per0910)

gradTable

##     Type Per0910    Num0910 Per1011    Num1011 Change
## All  All   74.72 378976       76.26 382558       1.54
## 5      5   68.08 159364       70.41 167886       2.33
## 1      1   67.32 2933         68.00 2692         0.68
## 2      2   89.01 40384        89.66 39717        0.65
## 3      3   72.33 2520         74.26 2432         1.93
## 4      4   87.34 12002        89.00 12104        1.66
## 6      6   60.54 25224        62.76 24917        2.22
## 7      7   83.52 130539       85.42 124863       1.90
## 9      9   82.80 4534         81.36 5311        -1.44
## 0      0   53.81 1476         46.26 2636        -7.55
## EL    EL   56.39 53015        60.21 60280        3.82
## MIG  MIG   71.14 10790        71.86 9794         0.72
## SE    SE   56.72 34385        59.04 34156        2.32
## SD    SD   68.04 204189       69.94 219856       1.90

Reproduce table of Dropout Rates

drop09 <- data09[, c("Subgroup", "Subgrouptype", "NumDropouts", "Cohort.Dropout.Rate")]
colnames(drop09) <- c("Subgroup", "Subgrouptype", "Num0910", "Per0910")
drop10 <- data10[, c("Subgroup", "Subgrouptype", "NumDropouts", "Cohort.Dropout.Rate")]
colnames(drop10) <- c("Subgroup", "Subgrouptype", "Num1011", "Per1011")

# create a table similar to the news release
dropTable <- merge(drop09, drop10)
dropTable$Type <- c(dropTable$Subgrouptype[1:9], dropTable$Subgroup[10:16])
dropTable <- dropTable[, c("Type", "Per0910", "Num0910", "Per1011", 
    "Num1011")]
row.names(dropTable) <- dropTable$Type
dropTable <- dropTable[c("All", "5", "1", "2", "3", "4", "6", "7", 
    "9", "0", "EL", "MIG", "SE", "SD"), ]  #re-order to match news release order
dropTable$Change <- as.numeric(dropTable$Per1011 - dropTable$Per0910)

dropTable

##     Type Per0910    Num0910 Per1011    Num1011 Change
## All  All    16.6 84298         14.4 72314        -2.2
## 5      5    20.8 48706         17.7 42126        -3.1
## 1      1    22.1 962           20.7 818          -1.4
## 2      2     7.2 3270           6.2 2764         -1.0
## 3      3    19.6 683           17.4 571          -2.2
## 4      4     7.8 1078           6.7 906          -1.1
## 6      6    26.7 11143         24.7 9791         -2.0
## 7      7    10.7 16759          8.9 12980        -1.8
## 9      9    10.2 556           11.2 730           1.0
## 0      0    41.6 1141          28.6 1628        -13.0
## EL    EL    29.0 27264         24.8 24858        -4.2
## MIG  MIG    18.8 2845          17.3 2352         -1.5
## SE    SE    21.9 13296         18.4 10628        -3.5
## SD    SD    20.1 60309         17.6 55483        -2.5

Note

Data used by the news release appears to be similar, but not identical, to the data currently available on the web. The discrepancies in the above numbers and those in the news article are attrbuted to the slightly different datasets.
Also, the sum of graduation percentage and dropout percentage for a category need not add up to 100 because some students take longer than 4 years to graduate and are not (yet) accounted here.

Repeat the analysis on Fremont Unified School District

subsetName <- "Fremont Unified"
# cohort09
data09 <- subset(coh09, Name == subsetName & (Subgroup == "All" | 
    Subgrouptype == "All"))
# cohort10
data10 <- subset(coh10, Name == subsetName & (Subgroup == "All" | 
    Subgrouptype == "All"))

Table of Graduation Rates

grad09 <- data09[, c("Subgroup", "Subgrouptype", "NumGraduates", 
    "Cohort.Graduation.Rate")]
colnames(grad09) <- c("Subgroup", "Subgrouptype", "Num0910", "Per0910")
grad10 <- data10[, c("Subgroup", "Subgrouptype", "NumGraduates", 
    "Cohort.Graduation.Rate")]
colnames(grad10) <- c("Subgroup", "Subgrouptype", "Num1011", "Per1011")

# create a table similar to the news release
gradTable <- merge(grad09, grad10)
gradTable$Type <- c(gradTable$Subgrouptype[1:9], gradTable$Subgroup[10:16])
gradTable <- gradTable[, c("Type", "Per0910", "Num0910", "Per1011", 
    "Num1011")]
row.names(gradTable) <- gradTable$Type
gradTable <- gradTable[c("All", "5", "1", "2", "3", "4", "6", "7", 
    "9", "0", "EL", "MIG", "SE", "SD"), ]  #re-order to match news release order
gradTable$Change <- as.numeric(gradTable$Per1011 - gradTable$Per0910)

gradTable

##     Type Per0910    Num0910 Per1011    Num1011 Change
## All  All   82.74 2177         89.34 1952         6.60
## 5      5   69.06 279          79.42 220         10.36
## 1      1   90.00          *  100.00          *  10.00
## 2      2   89.63 1055         93.49 977          3.86
## 3      3   80.95 17           68.42 13         -12.53
## 4      4   80.37 131          92.25 119         11.88
## 6      6   67.86 76           82.52 85          14.66
## 7      7   82.19 586          87.98 505          5.79
## 9      9   81.48 22           85.71 24           4.23
## 0      0   50.00          *   83.33          *  33.33
## EL    EL   68.56 205          73.06 179          4.50
## MIG  MIG   71.43          *   85.71          *  14.28
## SE    SE   55.47 137          65.41 121          9.94
## SD    SD   72.69 503          79.42 436          6.73

Table of Dropout Rates

drop09 <- data09[, c("Subgroup", "Subgrouptype", "NumDropouts", "Cohort.Dropout.Rate")]
colnames(drop09) <- c("Subgroup", "Subgrouptype", "Num0910", "Per0910")
drop10 <- data10[, c("Subgroup", "Subgrouptype", "NumDropouts", "Cohort.Dropout.Rate")]
colnames(drop10) <- c("Subgroup", "Subgrouptype", "Num1011", "Per1011")

# create a table similar to the news release
dropTable <- merge(drop09, drop10)
dropTable$Type <- c(dropTable$Subgrouptype[1:9], dropTable$Subgroup[10:16])
dropTable <- dropTable[, c("Type", "Per0910", "Num0910", "Per1011", 
    "Num1011")]
row.names(dropTable) <- dropTable$Type
dropTable <- dropTable[c("All", "5", "1", "2", "3", "4", "6", "7", 
    "9", "0", "EL", "MIG", "SE", "SD"), ]  #re-order to match news release order
dropTable$Change <- as.numeric(dropTable$Per1011 - dropTable$Per0910)

dropTable

##     Type Per0910    Num0910 Per1011    Num1011 Change
## All  All     5.1 133            5.5 120           0.4
## 5      5     9.7 39            13.4 37            3.7
## 1      1     0.0          *     0.0          *    0.0
## 2      2     1.7 20             2.5 26            0.8
## 3      3     4.8          *    10.5          *    5.7
## 4      4     4.9          *     6.2          *    1.3
## 6      6    17.0 19             9.7          *   -7.3
## 7      7     5.8 41             5.9 34            0.1
## 9      9    11.1          *    10.7          *   -0.4
## 0      0    50.0          *     0.0          *  -50.0
## EL    EL    11.4 34            16.3 40            4.9
## MIG  MIG     7.1          *    14.3          *    7.2
## SE    SE    12.6 31            11.4 21           -1.2
## SD    SD     9.4 65            11.8 65            2.4

Some Observations

Statewide graduation rates, consistent with the news release, have increased across the different ethinic and other demographic groups considered.
The Fremont Unified School District’s graduation rates show a similar pattern – increase in graduation rates across the groups.
Statewide dropout rates have decreased across most categories. However, Fremont Unified’s dropout rates have increased across most categories.
For whatever reason the results in the news release are not presented by gender.

Closer look at Fremont Unified’s High Schools

Now I use the enrollment and dropout numbers by year data, 1992-2011, to see the time series of dropout rates for the 5 high schools in Fremont Unified.

Identify high schools in Fremont Unified – I look at only public schools.

# school info from http://www.cde.ca.gov/ds/si/ds/pubschls.asp
allSchools <- read.csv("pubschls.txt", header = TRUE, sep = "\t", 
    as.is = TRUE, colClasses = "character")
# Fremont specific data
freSchools <- subset(allSchools, County == "Alameda" & District == 
    "Fremont Unified" & StatusType == "Active" & SOC == 66)

freSchools$School

## [1] "American High"         "Irvington High"        "John F. Kennedy High" 
## [4] "Mission San Jose High" "Washington High"

For each year, extract the total dropouts (“DTOT”) and total enrollment (“ETOT”) for 9,10,11 and 12 grades. Estimate the average dropout rate as the fraction DTOT/ETOT.

# dropout data from http://www.cde.ca.gov/ds/sd/sd/filesdropouts.asp
extns <- substr(as.character(seq(1992, 2011, by = 1)), 3, 4)
freData <- NULL  #stores summary
for (eachYr in extns) {
    dropData <- read.csv(paste("dropouts", eachYr, ".txt", sep = ""), header = TRUE, 
        sep = "\t", as.is = TRUE, colClasses = "character")

    # Fremont specific data
    yearData <- subset(dropData, CDS_CODE %in% freSchools$CDSCode)
    yearData$ETOT <- as.numeric(yearData$ETOT)
    yearData$DTOT <- as.numeric(yearData$DTOT)
    # total dropouts and total enrolled for each school by year
    yearSumm <- aggregate(cbind(DTOT, ETOT) ~ CDS_CODE + YEAR, data = yearData, 
        FUN = sum)
    # update output
    freData <- rbind(freData, yearSumm)
}

Plot the results.

schoolCodes <- levels(as.factor(freData$CDS_CODE))
schoolNames <- freSchools$School[schoolCodes %in% freSchools$CDSCode]
png(filename = "fredrop.png")
par(mfrow = c(5, 1), mar = c(1, 4, 3, 2))
yearNames <- seq(1992, 2011, by = 1)
for (eachSchool in 1:length(schoolCodes)) {
    subData <- subset(freData, CDS_CODE == schoolCodes[eachSchool])
    barplot(subData$DTOT * 100/subData$ETOT, xlab = "", ylim = c(0, 5), ylab = "drop %", 
        names.arg = yearNames, main = schoolNames[eachSchool])
}
garbage <- dev.off()

From the graph it is evident that between 2011 and 2010 American and Irvington’s droprates have decreased, while Kennedy, Mission and Washington’s droprates have increased. However, both increases and decreases in recent droprates are relatively small. Fremont schools have pretty low dropout rates!

Summary

While what I did here sheds light on some of the issues with the Department’s analyis, it is by no means complete. The University of California in Santa Barbara has an entire project dedicated to analyze the dropout rates – UCSB Dropout Research Project.

To leave a comment for the author, please follow the link and comment on their blog: a modeler's tribulations.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.