Introducing an R package for school district shapefiles
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I would like to introduce {leaidr}, a package for mapping U.S. school district shapefiles!
Inspired by my coauthor Joshua Rosenberg, the goal of {leaidr} is to facilitate the download and use of school district shapefiles.
School districts in the U.S. have associated local education agency identification numbers (LEAID) used in the National Center for Education Statistics (NCES) Common Core of Data. These are very useful because if you have other datasets with NCES IDs, then you can (sometimes easily) join them.
It can be very useful to visualize districts and associated data. District shapefiles are available in different places, such as through the NCES and Stanford Education Data Archive. The package {tigris} also has a school district option, but unfortunately, it is missing a few district polygons.
library(tigris) ca <- tigris::school_districts(state = "06", type = "unified") plot(ca["GEOID"])
{leaidr} downloads NCES’ U.S. district shapefile from Github using ROpenSci’s {piggyback} package. This is a super helpful package, as Github caps file uploads at 100 MB (and the shapefile is ~170 MB). I originally tried Github Large File Storage (LFS), but it stores files as a hash, not as an actual file. Therefore, I couldn’t figure out how to use it for a package that others can use.
The function lea_get()
downloads an R Data file from the Github repo to your designated path and then writes the necessary shapefiles. Then, create an object with lea_prep()
by designating where the shapefiles exist and which state(s) you would like. Note: For now, you must use the state’s FIPS code. FIPS state codes are numeric and two-letter alphabetic codes to identify U.S. states and certain other associated areas. A reference table is found here.
Once you have the shapefile, then you can merge with other datasets and plot using packages like {leaflet} and {ggplot2}.
Example of creating a school district map
Let’s walk through an example where we will merge external data to the shapefile and then map all the districts in California. The external data is from Josh’s covidedu
project, which scrapes district websites for specific words. In this case, the search terms were “covid*”, “coron*”, and “closure”. I highly recommend using covidedu
for easy scraping from a lot of district websites!
First, let’s call our packages.
library(tidyverse) # if you haven't installed the package yet # devtools::install_github("ivelasq/leaidr") library(leaidr) library(maptools) # if you don't have this downloaded # install.packages("mapproj")
Time to get your data! Use {leaidr} to download and prep your shapefiles for California (FIPS == 06). Read in the external data (in this case, summary-of-table-of-links.csv
).
# download the shapefile into a designated folder leaidr::lea_get(path = "./test") # prep the shapefile for the state(s) you'd like ca_shapefile <- leaidr::lea_prep(path = "./test", fips = "06") # read in the external data that also has NCES ID's # this is from the covidedu project ca_data <- read_csv("https://raw.githubusercontent.com/making-data-science-count/covidedu/master/output/2020-04-29/summary-of-table-of-links.csv")
Join the CSV to the shapefile.
ca_merge <- sp::merge(ca_shapefile, ca_data, by.x = "GEOID", by.y = "nces_id")
Now ‘fortify’ the data - this converts the polygons into points. This is so that ggplot can create the plot.
If you get the error isTRUE(gpclibPermitStatus()) is not TRUE
, then you need to enable gpclib
by running gpclibPermit()
(this is part of the {maptools} package, which should have been loaded above). Note that support for gpclib
will be withdrawn from maptools at the next major release, so you might have to try something else if the package has been upgraded.
If you run gpclibPermit()
and you keep getting FALSE
, then you are missing the package {gpclib}. Install the package, then run gpclibPermit()
to set it to TRUE
.
(I don’t know if this is the best/only way to do this - if anybody has suggestions, please let me know!)
# install.packages("gpclib") gpclibPermit() ca_points <- fortify(ca_merge, region = "GEOID")
Now, join the points and the shapefile data.
ca_df <- left_join(ca_merge@data, ca_points, by = c("GEOID" = "id"))
We can finally plot the shapefile and its data!
ca_map <- ca_df %>% ggplot() + geom_polygon(aes(x = long, y = lat, group = group, fill = any_link_found), color = "gray", size = .2) + theme_void() + scale_fill_iv() + ggtitle("COVID-Related Links Found on CA School District Sites")
To make a nicer looking map, then you can use coord_map()
.
map_projected <- ca_map + coord_map() print(map_projected)
Tada! A full school district map for California.
Call for collaboration
Please try out {leaidr}! I hope that it is useful to you in your work. I’d love any collaborators to join me in making it easier/better!
- Other functionality: For example: being able to filter shapefiles by NCES IDs as well as states; adding commonly used data (like district demographics).
- Issues: If you run into any issues, please post on the GitHub page!
Resources
Ever wish you could easily create U.S. school district maps 🗺🏫 in #rstats? Check out {leaidr}, which downloads the shapefile from GitHub using @rOpenSci 's {piggyback} 🐷! Would love any thoughts on how to improve! Repo: https://t.co/yOszd4agYM Tutorial: https://t.co/JuqGQcVR7Z
— Isabella Velásquez (@ivelasq3) May 11, 2020
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.