Site icon R-bloggers

Case Study: Mapping German ZIP Codes in R

[This article was first published on R – AriLamstein.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Shortly after running a webinar last week I received an email from Achim Rumberger, who attended the webinar live. Achim immediately put the webinar material into use for his own project, which involves mapping ZIP Codes in Germany. This was the first case study I received related to my new course, Shapefiles for R Programmersand I wanted to share it with my readers.

I’m neuro-biologist by training, but for the past decade I make my living as a JAVA enterprise developer. I’m living and working in Germany, near Stuttgart, where Mercedes Benz cars and Porsches are manufactured.

Recently my daughter came to me with a problem – for her bachelor thesis she wants to see if certain kind of schools will benefit from the help of social services. So I suggested to her, to look for public available data, describing the social structure of the environment of these schools in question.

And all of a sudden, there was the problem to show some data in a geographical context.

Just at this time Ari published his webinar about getting shape files into R. Which also includes a introduction to shape files to get you going, if you are new to it, as I am. I remembered Ari from his mail course introducing his great R-package (choroplethr). By the way this is a terrible name, being a biologist by heart, I always type “chloroplethr”, as in “chlorophyll”, and this is not found by the R package manager. [Editor’s note: I agree!]

Next question, where do I get the shapefiles, describing Germany? A major search engine was of great help here. http://www.suche-postleitzahl.org/downloads?download=zuordnung_plz_ort.csv . Germany has some 8700 zip code areas, so expect some time for rendering the file, if you do on your computer. Right on this side one can also find a dataset which might act as a useful warm up practice to display statistical data in a geographical context. Other sources are https://datahub.io/de/dataset/postal-codes-de

And so I got started…

For development in R, I’m using RStudio, as its free, and most closely resembles the IDEs I know from JAVA development, though its functionality is a throwback to the 90s, compared to modern IDEs like Netbeans or IntelliJ. Especially in respect to refactoring and debugging.

To complete this task, there are whole lot of packages needed. But these will make life a whole lot easier. Look for the most recent packages, especially “readr”, which helps with loading the csv files into R. With “readr” you do not have to convert factors in strings again, as it is the case with the readcsv function from R.

The code itself is mostly copied and pasted from Ari’s webinar. It basically loads the shapefile into R, converts it into a data.frame and provides this data.frame with a variable called “region”. Then the stat data is loaded into R, again, the columns are renamed to something “choroplethr” recognizes (“region” and “values”), and finally “choroplethr” its doing it magic.

My hardware used is a MacBook from 2011 with 16GB RAM and an i7 Processor. The execution time for each of the plots is around 5 mins. R is using only a single core of the 4 available, but big memory helps.

And here is the code:

#make sure the workspace is in pristine condition
rm(list=ls(all=TRUE))
#shapefile from https://datahub.io/de/dataset/postal-codes-de
#http://www.suche-postleitzahl.org/downloads?download=zuordnung_plz_ort.csv
#post questions here: http://gis.stackexchange.com/
library(choroplethr)
library(dplyr)
library(ggplot2)
library(rgdal)
library(maptools)
library(gpclib)
library(readr)
library(R6)

setwd("<path to your shape file>/plz-gebiete.shp/")
ger_plz <- readOGR(dsn = ".", layer = "plz-gebiete")

gpclibPermit()
#convert the raw data to a data.frame as ggplot works on data.frames
ger_plz@data$id <- rownames(ger_plz@data)
ger_plz.point <- fortify(ger_plz, region="id")
ger_plz.df <- inner_join(ger_plz.point,ger_plz@data, by="id")

head(ger_plz.df)
ggplot(ger_plz.df, aes(long, lat, group=group )) + geom_polygon()

#data file
df <- read_csv("<path to your stat data>/de_plz_einwohner.csv")
# variable name 'region' is needed for choroplethr
ger_plz.df$region <- ger_plz.df$plz
head(ger_plz.df)

#subclass choroplethr to make a class for your my need

GERPLZChoropleth <- R6Class("GERPLZChoropleth",
    inherit = choroplethr:::Choropleth,
    public = list(
        initialize = function(user.df) {
            super$initialize(ger_plz.df, user.df)
        }
    )
)
#choropleth needs these two columnames - 'region' and 'value'
colnames(df) = c("region", "value")

#instantiate new class with data
c <- GERPLZChoropleth$new(df)

#plot the data
c$ggplot_polygon = geom_polygon(aes(fill = value), color = NA)
c$title = "Comparison of number of Inhabitants per Zipcode in Germany"
c$legend= "Number of Inhabitants per Zipcode"
c$set_num_colors(9)
c$render()

And here is the result:

What you can see from the map, is that zip codes in the north and east of Germany tend to be larger. Small size zip codes coincide with metropolitan areas in Germany.

Note that with choroplethr, it is easy to combine this choropleth map with a google map. This is useful for people who aren’t already familiar with German geography:

c$render_with_reference_map()

Want to learn how to do projects like this? Take my course Shapefiles for R Programmers.

The post Case Study: Mapping German ZIP Codes in R appeared first on AriLamstein.com.

To leave a comment for the author, please follow the link and comment on their blog: R – AriLamstein.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.