Introducing the Kernelheaping Package II
[This article was first published on INWT-Blog-RBloggers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the first part of <a target="_blank" style = "color:#527BBA; text-decoration: underline;" href="https://www.inwt-statistics.com/read-blog/introducing-the-kernelheaping-package-512.html">Introducing the Kernelheaping Package</a> I showed how to compute and plot kernel density estimates on rounded or interval censored data using the Kernelheaping package. Now, let’s make a big leap forward to the 2-dimensional case. Interval censoring can be generalised to rectangles or alternatively even arbitrary shapes. That may include counties, zip codes, electoral districts or administrative districts. Standard area-level mapping methods such as chloropleth maps suffer from very different area sizes or odd area shapes which can greatly distort the visual impression. The Kernelheaping package provides a way to convert these area-level data to a smooth point estimate. For the German capital city of Berlin, for example, there exists an open data initiative, where data on e.g. demographics is publicly available. We first load a dataset on the Berlin population, which can be downloaded from: <a target="_blank" style = "color:#527BBA; text-decoration: underline;" href="https://www.statistik-berlin-brandenburg.de/opendata/EWR201512E_Matrix.csv">https://www.statistik-berlin-brandenburg.de/opendata/EWR201512E_Matrix.csv</a> ```r library(dplyr) library(fields) library(ggplot2) library(Kernelheaping) library(maptools) library(RColorBrewer) library(rgdal) gpclibPermit() ``` ```r data <- read.csv2("EWR201512E_Matrix.csv") ``` This dataset contains the numbers of inhabitants in certain age groups for each administrative districts. Afterwards, we load a shapefile with these administrative districts, available from: <a target="_blank" style = "color:#527BBA; text-decoration: underline;" href="https://www.statistik-berlin-brandenburg.de/opendata/RBS_OD_LOR_2015_12.zip">https://www.statistik-berlin-brandenburg.de/opendata/RBS_OD_LOR_2015_12.zip</a> ```r berlin <- readOGR("RBS_OD_LOR_2015_12/RBS_OD_LOR_2015_12.shp") ``` ```r berlin <- spTransform(berlin, CRS("+proj=longlat +datum=WGS84")) ``` Now, we form our input data set, which contains the area/polygon centers and the variable of interest, whose density should be calculated. In this case we' like to calculate the spatial density of people between 65 and 80 years of age: ```r dataIn <- lapply(berlin@polygons, function(x) x@labpt) %>% do.call(rbind, .) %>% cbind(data$E_E65U80) ``` In the next step we calculate the bivariate kernel density with the “dshapebivr” function (this may take some minutes) using the prepared data and the shape file: ```r est <- dshapebivr(data = dataIn, burnin = 5, samples = 10, adaptive = FALSE, shapefile = berlin, gridsize = 325, boundary = TRUE) ``` To plot the map with "ggplot2", we need to perform some additional data preparation steps: ```r berlin@data$id <- as.character(berlin@data$PLR) berlin@data$E_E65U80 <- data$E_E65U80 berlinPoints <- fortify(berlin, region = "id") berlinDf <- left_join(berlinPoints, berlin@data, by = "id") kData <- data.frame(expand.grid(long = est$Mestimates$eval.points[[1]], lat = est$Mestimates$eval.points[[2]]), Density = as.vector(est$Mestimates$estimate)) %>% filter(Density > 0) ``` Now, we are able to plot the density together with the administrative districts: ```r ggplot(kData) + geom_raster(aes(long, lat, fill = Density)) + ggtitle("Bivariate density of Inhabitants between 65 and 80 years") + scale_fill_gradientn(colours = c("#FFFFFF", "#5c87c2", "#19224e")) + geom_path(color = "#000000", data = berlinDf, aes(long, lat, group = group)) + coord_quickmap() ```
This map gives a much better overall impression of the distribution of older people than a simple choropleth map: ```r ggplot(berlinDf) + geom_polygon(aes(x = long, y = lat, fill = E_E65U80, group = id)) + ggtitle("Number of Inhabitants between 65 and 80 years by district") + scale_fill_gradientn(colours = c("#FFFFFF", "#5c87c2", "#19224e"), "n") + geom_path(color = "#000000", data = berlinDf, aes(long, lat, group = group)) + coord_quickmap() ```
Often, as the case with Berlin we may have large uninhabited areas such as woods or lakes. Furthermore, we may like to compute the proportion of older people compared to the overall population in a spatial setting. The third part of this series shows how you can compute boundary corrected and smooth proportion estimates with the Kernelheaping package.
Further parts of the article series Introducing the Kernelheaping Package:
- Part 1: Univariate Kernel Density Estimation for Heaped Data
- Part 2: Bivariate Kernel Density Estimation for Heaped Data
To leave a comment for the author, please follow the link and comment on their blog: INWT-Blog-RBloggers.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.