Thank you again to Zhaochen He for becoming the new maintainer of choroplethr. As I mentioned here, choroplethr was downloaded over 1,500 times in the month before it was archived. Archiving it impacted a significant number of users. Thanks to Zhaochen’s efforts, the package is now back on CRAN.
Choroplethr v4.0.0 is now on CRAN
[This article was first published on R – Ari Lamstein, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
choroplethr
version 4.0.0 is now on CRAN. You can install it like this:
install.packages("choroplethr") packageVersion("choroplethr") # [1] ‘4.0.0’
With this version, I have transferred the maintenance of choroplethr to Zhaochen He, an economics professor at Christopher Newport University. Zhao addressed the issues that led to choroplethr being archived from CRAN in February. Please join me in thanking Zhao for his contribution!
Changes in v4.0.0
The primary changes in this version are:
- choroplethr now only uses the
tidycensus
package to get data from the Census Bureau API. By removing the dependency on theacs
package (which was recently archived), choroplethr can be hosted on CRAN again. - Data returned by the functions
get_state_demographics
,get_county_demographics
andget_tract_demographics
has been simplified. These functions previously returned eight columns of data for each region. Now they return only two:population
andmedian_hh_income
. - The function
choroplethr_animate
has been removed from the package.
Future Work
Future development of choroplethr will be decided by Zhao. That’s the best part of being a maintainer, and I don’t want to take it away from him. But I also want to note three issues I see facing the project: outdated maps, simple features and backwards compatibility.
Outdated Maps
When Choroplethr was released in 2014, one of its contributions was packaging contemporary maps and making them accessible to the entire R ecosystem. (For reference, at that time, the world map that shipped with the maps
package still included the USSR!)
Unfortunately these maps are now outdated, and it is causing problems in some cases. For example, in 2022 Census changed the names and boundaries of Connecticut’s county-equivalents (link). Because choroplethr’s county map is from 2010, attempting to map data from 2022 or later generates an error. Here is a code snippet that demonstrates the issue:
library(choroplethr) #Works df = get_county_demographics(2021) df$value = df$population county_choropleth(df, state_zoom='connecticut') #Error df = get_county_demographics(2022) df$value = df$population county_choropleth(df, state_zoom='connecticut')
In addition to the county map, I believe that the following maps should also be updated:
- choroplethrZip packages the Census Bureau’s 2010 ZIP Code Tabulation Area (ZCTA) map. A newer version of that map was published in 2020. There appear to be significant differences between the two versions, and I think that choroplethr should use the newer one.
- Choroplethr’s country (Admin 0) map is version 2.0.0 from Natural Earth Data. The latest version is 5.1.1. Being on the latest major version of the map seems like a good idea.
- Choroplethr’s province (Admin 1) map is version 3.0.0 from Natural Earth Data. The latest version is 5.1.0. Being on the latest major version of this map also seems like a good idea.
- Calling
county_choropleth
actually renders two maps: a state map superimposed over a county map. This helps users understand which counties are part of which state. Currently, both maps are from 2010. If we update the county map, then I think we should also update the state map to be from the same year.
Simple Features
Choroplethr was published when the best way to make a map with ggplot2 was by “fortifying” a shapefile. Towards the end of my active development on choroplethr the “Simple Features” package was created. My impression is that there might be advantages to migrating choroplethr to use simple features. Unfortunately, I never got around to researching this.
I think it would be useful if someone researches Simple Features and generates an informed opinion on whether (and how) to incorporate it into choroplethr.
Backwards Compatibility
One of the highlights of developing choroplethr was when the US Census Bureau commissioned a video course on it. That course is still on their website (link), and it would be awkward if the instructions in it somehow stopped working.
That said, the course was published in 2016 and should not prevent genuine innovation. I am not sure the best way to balance this, and it’s possible that I became too conservative about breaking backwards compatibility after the course was released.
I hope that Zhao can find a balance between innovation and respecting backwards compatibility.
Example
Back in choroplethr’s heyday I would include an example in each post. This might be my last post about choroplethr, and I thought it would be nice to include an example here as well.
Since the output of the functions that get state, county and tract demographics have changed, let’s use them. We can use those functions to explore how the median household income in the US changed between 2009 (the first 5-year ACS) and 2023 (the last 5-year ACS).
Change in State Median Household Income
Here’s how to map the percent change in median income in each state between 2009 and 2023:
library(choroplethr) stopifnot(packageVersion("choroplethr") >= '4.0.0') #Get data from 2009 and 2023 df_2009 = get_state_demographics(2009, 5) df_2009$value = df_2009$median_hh_income df_2023 = get_state_demographics(2023, 5) df_2023$value = df_2023$median_hh_income #Calculate and map percent change df_final = calculate_percent_change(df_2009, df_2023) state_choropleth(df_final, title = 'Change in Median Household Income: 2009 to 2023', legend = 'Percent Change', num_colors = 4)
This map really surprised me. Virtually all the states in the top quartile are in the western half of the country. And many of the exceptions to that rule (Nevada, Wyoming and New Mexico) are in the lowest quartile.
Change in County Median Household Income
For the county map, let’s zoom in on the five counties that make up New York City. Using a continuous scale will help us see the magnitude of the difference between each county. (Recall that when working with counties you need to use FIPS codes).
df_2009 = get_county_demographics(2009) df_2009$value = df_2009$median_hh_income df_2023 = get_county_demographics(2023) df_2023$value = df_2023$median_hh_income df_final = calculate_percent_change(df_2009, df_2023) nyc_counties = c(36005, 36047, 36061, 36081, 36085) county_choropleth(df_final, title = 'Change in Median Household Income: 2009 to 2023\nCounties in New York City', legend = 'Percent Change', num_colors = 1, county_zoom = nyc_counties)
Unfortunately county_choropleth
doesn’t print the names of the counties, so you need to have some familiarity with New York City to understand this map. The darkest county is Brooklyn (Kings County), which had an increase in median household income of 83% in just 14 years!
Change in Tract Median Household Income
I wondered if all of Brooklyn had a large increase in income, or just parts of it. We can answer that question by analyzing the census tracts in Brooklyn:
df_2009 = get_tract_demographics("new york", 36047, 2009) df_2009$value = df_2009$median_hh_income df_2023 = get_tract_demographics("new york", 36047, 2023) df_2023$value = df_2023$median_hh_income df_final = calculate_percent_change(df_2009, df_2023) tract_choropleth(df_final, "new york", county_zoom=36047, legend="Percent Change", title="Change in Median Household Income: 2009-2023\nCensus Tracts in Brooklyn, NY")
It appears that the northern half of Brooklyn experienced a much larger increase in median household income than the southern half. This might be due to its close proximity to Manhattan (northern Brooklyn is connected to Manhattan by three bridges and a tunnel). Also note the range of the scale: it goes from -40.5% to 362.9%!
Conclusion
To leave a comment for the author, please follow the link and comment on their blog: R – Ari Lamstein.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.