A Chlorpleth Map of Free and Reduced Price Lunch in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Charles Blow has an excellent op-ed in the New York Times about public education this week. The most important point he makes is that the defunding of public education is coming at precisely the time when American school children are most vulnerable:
Not only is our education system being starved of investment, but many of our children are literally too hungry to learn.
This sums up the problem, but it doesn’t show the extent of it. One way to show the truth of this point and how dramatically things have changed is to look at state reported proportions of students eligible for Free and Reduced Price Lunch meals. FRL eligibility is set at roughly 160% of the poverty line and students qualifying for these meals are the students Blow refers to in his piece.
For my work in Wisconsin I have spent a lot of time looking at the impact of increasing numbers of students FRL eligible in the state on education policy. Mapping this data can give a sense of the breadth and depth of the problem of children living in poverty, particularly in recent years.
Luckily, the state of Wisconsin makes data on the proportions of FRL students in every school district available in an easy to download format. Additionally, a shapefile of the state of Wisconsin school districts is easy enough to find here.
Using a GitHub repository (FRLmap), I have wrapped up the data necessary, the scripts necessary, and some code for downloading the shapefile to make the map below.
Then all we need is R and a few trusty packages to draw the map. I won’t explain all the code here (you can snag it from the GIST embedded below or the repo) but I did want to explain a couple of pieces of the code that are important.
First, downloading the shapefile from a web source is not straightforward in R.
download.file("http://dl.dropbox.com/u/1811289/shapefile/publicshapefileUHS.shp",
destfile=paste(getwd(),"/shapefile/publicshapefileUHS.shp",sep=""),mode="wb")
Here we need the “mode=’wb'” argument to tell R to download and save the file as a binary instead of as a table. Otherwise, it will download the file with the proper extension, but the file won’t behave as designed or have the properties necessarily to be read back in properly. Next, we have a crucial piece of code to merge the shapefile data with the data frame we downloaded.
#Merge
y<-frl_w
d = distall@data
d$sort=1:nrow(d)
y<-y[order(y$district_number),]
di<-merge(y,d,by.x="district_number",
by.y="CODE",all.y=TRUE)
di<-di[order(di$sort),]
##Drop extraneous data##
di$sort2<-di$sort-1
row.names(di)<-di$sort2
#row.names(di)<-di$sort
distall2<-spCbind(distall,di)
This little block of code is useful for merging up shapefile data and dataframes. @data indicates we want to create an object that pieces out our data from our shapefile. We then merge our data of interest (y), with the new dataframe created from the shapefile (d). After we do this, we resort the data and call spCbind to bring it back into the shapefile data structure.
Finally, to draw the plot, we use fortify from the ggplot2 package to allow us to create a polygon object we can draw with ggplot2.
The result is a striking graphic depicting the socioeconomic changes in Wisconsin over the last decade.

FRL Proportions from 2001-2012
####################################### | |
## FRPL MAP ## | |
## Create a series of maps plotting ## | |
## FRPL proportions from Wisconsin ## | |
## Publicly available data ## | |
###################################### | |
# Version 0.7 | |
# Date: 08/21/2012 | |
# Author: Jared E. Knowles, Policy Research Advisor DPI | |
# Data sources: WINSS (http://data.dpi.state.wi.us/data/) | |
# WISEemaps (http://wisemaps.dpi.wi.gov) | |
# Read in data | |
# Loop through CSV files of public data on FRL populations | |
for(i in 2001:2012){ | |
eval(parse(text=paste("frl",i,"<-read.csv('data/frl",i,".csv')",sep=""))) | |
eval(parse(text=paste("frl",i,"<-frl",i,"[,c(1,7,11,13:17)]",sep=""))) | |
} | |
# Combine into one | |
frl<-rbind(frl2001,frl2002) | |
for(i in 2003:2012){ | |
eval(parse(text=paste("frl<-rbind(frl,frl",i,")",sep=""))) | |
} | |
# Clean up | |
for(i in 2001:2012){ | |
eval(parse(text=paste("rm(frl",i,")",sep=""))) | |
} | |
# Function to make data numeric | |
destring<-function(x){ | |
x<-as.numeric(as.character(x)) | |
} | |
# Remove non-character data | |
frl<-apply(frl,2,as.character) | |
frl<-data.frame(year=as.numeric(frl[,1]),district_number=as.numeric(frl[,2]), | |
district_name=as.character(frl[,3]),total_enrollment=as.numeric(frl[,4]), | |
econ_disadv_count=as.numeric(frl[,5]),econ_disadv_per=as.numeric(frl[,6]), | |
not_econ_disadv_count=as.numeric(frl[,7]),not_econ_disadv_per=as.numeric(frl[,8])) | |
dataerrors<-subset(frl,econ_disadv_per>100 | econ_disadv_per< 0) # 4 data errors | |
frl<-subset(frl,total_enrollment<200000 & econ_disadv_per<101) # Remove errors and state | |
# Set cuts | |
frl$cut<-cut(frl$econ_disadv_per,breaks=c(-1,15,25,35,50,100), | |
labels=c('Less than 15%','15-25%','25-35%','35-50%','More than 50%')) | |
# Alternate cut | |
frl$cut2<-cut(frl$econ_disadv_per,breaks=c(-1,5,15,25,35,50,100), | |
labels=c('Less than 5%','5-15%','15-25%','25-35%','35-50%','More than 50%')) | |
# Alternate cut 3 | |
frl$cut3<-cut(frl$econ_disadv_per,breaks=c(-1,10,20,30,40,50,100), | |
labels=c('Less than 10%','10-20%','20-30%','30-40%','40-50%','More than 50%')) | |
# Reshape wide for ggplot2 mapping | |
frl_w<-reshape(frl,timevar='year', | |
idvar=c('district_number','district_name'),direction='wide') | |
# Read in GIS libraries | |
library(maptools) | |
library(RColorBrewer) | |
library(ggplot2) | |
library(mapproj) | |
require(gpclib) | |
gpclibPermit() | |
# Acquire shapefile | |
dir.create("shapefile") | |
# Download script reads from Dropbox share | |
# mode argument writes files as binaries and not as text | |
download.file("http://dl.dropbox.com/u/1811289/shapefile/publicshapefileUHS.shp", | |
destfile=paste(getwd(),"/shapefile/publicshapefileUHS.shp",sep=""),mode="wb") | |
download.file("http://dl.dropbox.com/u/1811289/shapefile/publicshapefileUHS.shx", | |
destfile=paste(getwd(),"/shapefile/publicshapefileUHS.shx",sep=""),mode="wb") | |
download.file("http://dl.dropbox.com/u/1811289/shapefile/publicshapefileUHS.dbf", | |
destfile=paste(getwd(),"/shapefile/publicshapefileUHS.dbf",sep=""),mode="wb") | |
download.file("http://dl.dropbox.com/u/1811289/shapefile/publicshapefileUHS.qpj", | |
destfile=paste(getwd(),"/shapefile/publicshapefileUHS.qpj",sep=""),mode="wb") | |
download.file("http://dl.dropbox.com/u/1811289/shapefile/publicshapefileUHS.prj", | |
destfile=paste(getwd(),"/shapefile/publicshapefileUHS.prj",sep=""),mode="wb") | |
# Read in shapefile | |
fn<-paste(getwd(),"/shapefile/publicshapefileUHS",sep="") | |
distall<-readShapePoly(fn) | |
# Read in map themes | |
source('ggplot2themes.R') | |
#Merge | |
y<-frl_w | |
d = distall@data | |
d$sort=1:nrow(d) | |
y<-y[order(y$district_number),] | |
di<-merge(y,d,by.x="district_number",by.y="CODE",all.y=TRUE) | |
di<-di[order(di$sort),] | |
##Drop extraneous data## | |
di$sort2<-di$sort-1 | |
row.names(di)<-di$sort2 | |
#row.names(di)<-di$sort | |
distall2<-spCbind(distall,di) | |
#rm(y,VAmath) | |
# Make the object back into a data frame we can plot with ggplot2 methods | |
df.points = fortify(distall2,region='CODE') | |
df.points=merge(df.points,distall2@data,by.x='id',by.y='CODE') | |
df.points<-df.points[order(df.points$group,df.points$piece, | |
df.points$order),] | |
# Get names for labeling polygons | |
cnames<-cbind(coordinates(distall2),distall2@data$CODE, | |
as.character(distall2@data$NAME)) | |
# Match names to district ID | |
cnames<-data.frame(long=as.numeric(cnames[,1]),lat=as.numeric(cnames[,2]), | |
distid=cnames[,3], | |
distname=cnames[,4]) | |
# Remove unnecessary items from workspace | |
rm(d,di,frl,y,distall,distall2) | |
# Clean up the memory | |
gc() | |
# Fix colors across years at 10% intervals | |
cols3<-c('Less than 10%'='#F1EEf6','10-20%'='#D4B9DA','20-30%'='#C994C7','30-40%'='#DF65B0', | |
'40-50%'='#DD1C77','More than 50%'='#980043') | |
# Create a destination to stick the output images into: | |
dir.create("plots") | |
# Make PNG files | |
library(png) | |
for(i in 2001:2012){ | |
eval(parse(text=paste("ineq3",i,"<-ggplot(data=df.points,aes(long,lat))+ | |
geom_polygon(aes(group=group,fill=cut3.",i,"))+ | |
geom_path(aes(group=group),color='black',size=.10,linetype=1)+ | |
coord_equal()+ | |
scale_fill_manual(values=cols3)+ | |
geom_text(data=cnames,aes(x=long,y=lat,label=distname),size=.95)+ | |
guides(fill = guide_legend(title='% FRL',ncol=1))+ | |
opts(title='Proportion of Students FRL ",i,"')+ | |
theme_dpi_mapPNG()",sep=""))) | |
eval(parse(text=paste("png('plots/evenFRLmap",i,".png',height=1200,width=1000)",sep=""))) | |
eval(parse(text=paste("print(ineq3",i,")",sep=""))) | |
dev.off() | |
eval(parse(text=paste("rm(ineq3",i,")",sep=""))) | |
} | |
# Fix colors (same palette as above) | |
cols3<-c('Less than 10%'='#F1EEf6','10-20%'='#D4B9DA','20-30%'='#C994C7','30-40%'='#DF65B0', | |
'40-50%'='#DD1C77','More than 50%'='#980043') | |
# Make PDF files | |
for(i in 2001:2012){ | |
eval(parse(text=paste("ineq3",i,"<-ggplot(data=df.points,aes(long,lat))+ | |
geom_polygon(aes(group=group,fill=cut3.",i,"))+ | |
geom_path(aes(group=group),color='black',size=.10,linetype=1)+ | |
coord_equal()+ | |
scale_fill_manual(values=cols3)+ | |
geom_text(data=cnames,aes(x=long,y=lat,label=distname),size=.95)+ | |
guides(fill = guide_legend(title='% FRL',ncol=1))+ | |
opts(title='Proportion of Students FRL ",i,"')+ | |
theme_dpi_map2()",sep=""))) | |
eval(parse(text=paste("pdf('plots/evenFRLmap",i,".pdf',height=10,width=8)",sep=""))) | |
eval(parse(text=paste("print(ineq3",i,")",sep=""))) | |
dev.off() | |
eval(parse(text=paste("rm(ineq3",i,")",sep=""))) | |
} |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.