Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A few days ago, I was inspired by a set of photographs of Earth from space, at night.
The images are amazing, so I decided to try to replicate them using R.
To be a little more specific, I found a single dataset – a data set with French population data – and I’m using it to replicate the images for a single country: France.
A few notes and warnings before you get started
As usual, I’ve included the code for how to create the final data visualization.
Having said that, in this case I decided to use some new tools. Therefore, before trying to run the code yourself, there are a few notes and caveats to keep in mind.
This was built with the new < inline_code>sf package
I decided to build this with the new < inline_code>sf package.
< inline_code>sf is a new package for geospatial analysis that’s currently under development. You can read more about it here.
Warning: setting up < inline_code>sf and dependencies may be problematic
Before getting started, you should know that installing < inline_code>sf may require you to install some dependencies. You can find out more here.
If you decide to install < inline_code>sf and the dependencies, be warned: getting the dependencies set up (specifically, gdal) may be tricky and problematic, so attempt it at your own risk.
Code: mapping French population density
Ok, here’s the code.
First, we’ll load the packages that we’re going to need. (Of course, you need to have them installed as well.)
Notice also that we’re using the development version of < inline_code>ggplot2. We need this because we’ll be using < inline_code>geom_sf().
#======================================= # INSTALL DEVELOPMENT VERSION OF ggplot2 #======================================= library(devtools) dev_mode(on = T) install_github("hadley/ggplot2") #============== # LOAD PACKAGES #============== library(sf) library(tidyverse) library(stringr) library(scales)
Get the data
Next, we’ll get the data from a URL at Sharp Sight and load it into R:
#========= # GET DATA #========= url.france_pop <- url("http://sharpsightlabs.com/wp-content/datasets/france_population_data_2016.RData") load(url.france_pop) # INSPECT glimpse(df.france)
Change column names to lower case
We won’t have to make too many adjustments to the data, but we will transform the variable names to lower case.
#====================================== # CHANGE COLUMN NAMES: lowercase # - in the original dataset the column # names were capitalized # - we'll transform the names to # lower case #====================================== colnames(df.france) <- colnames(df.france) %>% str_to_lower() # INSPECT colnames(df.france)
Create < inline_code>density variable
In the visualization, we’re going to plot population density.
The dataset does not have a density variable, so we need to calculate it based on < inline_code>population and < inline_code>superficie (i.e. area). Keep in mind that < inline_code>superficie is in hectares, so we’ll be converting from “people/hectare” to “people/square kilometer.”
#========================= # CREATE DENSITY VARIABLE #========================= df.france$population %>% summary() #------------------------- # CREATE VARIABLE: density #------------------------- df.france <- df.france %>% mutate(density = population/superficie*100) # INSPECT colnames(df.france) head(df.france)
Plot a test chart
Next, we’ll plot a very simple test chart to see if things are roughly OK.
As we do this, note how simple it is to use < inline_code>geom_sf(). This is one of the advantages of using the new < inline_code>sf package.
#========================================== # PLOT TEST CHART # - here, we'll just plot a test chart # to see if the data are roughly correct # and to make sure that the data are # roughly in working order #========================================== #ggplot(df.france) + geom_sf() ggplot(df.france) + geom_sf(aes(fill = density))
This appears to be OK, but it’s tough to tell because the borders are too thick.
Let’s do one more test chart to try to remove the lines and get a better look:
#========================================== # PLOT SECOND TEST CHART # - here, we'll plot another test with # smaller borders # - we'll set size = .1 #========================================== ggplot(df.france) + geom_sf(aes(fill = density), color = NA)
Ok, we can see the fill color now (although the borders are still there).
Looking at this, we’ll have to solve at least 2 problems:
- Adjust the color scale
- Remove the borders
Most of the remaining work will be modifying the < inline_code>fill color to get the colors right, and adjust the scale so that differences between densities are more visible.
Calculate quantiles to identify color break points
Ultimately, we need to adjust the fill-colors associated with the < inline_code>fill aesthetic. We need to find “break points” to which we’ll associate specific fill values.
To do this, we’ll calculate quantiles.
#======================================================= # IDENTIFY QUANTILES # - the hardest part of creating the finalized # data visualization is getting the color scale # just right # - on final analysis, the best way to adjust the color # scale was by capping the population density # - we'll use these quantiles to find a good cap # via trial and error #======================================================= #--------- # VENTILES #--------- quantile(df.france$density, seq(from = 0, to = 1, by = .05)) # 0% 5% 10% 15% 20% 25% 30% # 0.000000 6.815208 10.032525 12.824891 15.614076 18.485419 21.719516 # 35% 40% 45% 50% 55% 60% 65% # 25.510974 29.538791 34.568568 40.088124 46.762867 54.489134 64.356473 # 70% 75% 80% 85% 90% 95% 100% # 77.376987 94.117647 118.935653 160.578630 242.086849 510.070301 41929.234973 #---------------------- # PERCENTILES: 90 - 100 #---------------------- quantile(df.france$density, seq(from = .9, to = 1, by = .01)) # 90% 91% 92% 93% 94% 95% 96% 97% 98% # 242.0868 268.1167 300.2817 347.6989 412.2042 510.0703 637.9211 876.0090 1300.6435 # 99% 100% # 2649.3653 41929.2350 # MEDIAN median(df.france$density) #40.08812
Create finalized version of chart
Let’s plot the finalized version. I’ll give a little explanation afterwards.
#================== # DETAILED VERSION # - add theme # - add colors #================== df.france %>% mutate(fill_var = density) %>% ggplot() + geom_sf(aes(fill = fill_var, color = fill_var)) + scale_fill_gradientn(colours = c("#000233","#f9cf86","#fceccf","white") ,values = rescale(c(0,500,3000,40000)) ,breaks = c(0,500,1500) ,guide = FALSE) + scale_color_gradientn(colours = c("#000233","#f9cf86","#fceccf","white") ,values = rescale(c(0,500,3000,40000)) ,guide = FALSE) + labs(title = "Population density in France\nvisualized to imitate satellite images of France at night" ,subtitle = "(people/sq km)" ) + theme(text = element_text(family = "Gill Sans", color = "#E1E1E1") ,plot.title = element_text(size = 18, color = "#E1E1E1") ,plot.subtitle = element_text(size = 10) ,plot.background = element_rect(fill = "#000223") ,panel.background = element_rect(fill = "#000223") ,panel.grid.major = element_line(color = "#000223") ,axis.text = element_blank() ,axis.ticks = element_blank() ,legend.background = element_blank() ) #================== # TURN OFF DEV MODE #================== dev_mode(on = F)
OK, the truth is that to create this finalized version, you actually need to do a lot of trial and error.
Essentially, you need to find the right cutoffs in the data and associate those values to specific fill values.
The point where we do that is in the following code:
scale_fill_gradientn(colours = c("#000233","#f9cf86","#fceccf","white") ,values = rescale(c(0,500,3000,40000))
Here, we’re taking values of our fill variable, < inline_code>density, and we’re associating those values to specific colors. How do we know which values to use and which colors to use?
Trial and error.
You need to take the quantile values that we calculated earlier, and test them out. Try different cutoff values for < inline_code>values = . Try different fill-values for < inline_code>colours = .
To create visualizations like this, master < inline_code>ggplot2
I’ll admit that this data visualization is a little challenging to create, simple because of the extensive trial and error that’s required.
It’s also a little out of the ordinary, because we’re using the some packages that are under development.
Having said that, this visualization is still essentially just a fancy application of < inline_code>ggplot2. If you really know what you’re doing with < inline_code>ggplot2 – if you’ve already mastered the basics – then this plot is fairly straightforward.
So, if you want to learn to create compelling data visualizations, master < inline_code>ggplot2.
Sign up now to master ggplot2
To get a job as a highly-paid data scientist, you need to know how to create compelling data visualizations.
To create compelling visualizations in R, you need to master < inline_code>ggplot2.
Sign up now we’ll show you how.
By signing up, you’ll get our free Data Science Crash Course, where you’ll learn:
- the foundations of ggplot2
- 3 essential data visualizations
- a step-by-step data science learning plan
- the 1 programming language you need to learn
- how to do data manipulation in R
- how to get started with machine learning
- and more …
In addition, you’ll also get weekly data science tutorials, delivered right to your inbox.
SIGN UP NOW
The post Mapping “France at night” with the new sf package appeared first on SHARP SIGHT LABS.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.