How to map geospatial data: USA rivers
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R Code
Here’s the R code to produce the map:
#=============== # LOAD PACKAGES #=============== library(tidyverse) library(maptools) #=============== # GET RIVER DATA #=============== #========== # LOAD DATA #========== #DEFINE URL # - this is the location of the file url.river_data <- url("http://sharpsightlabs.com/wp-content/datasets/usa_rivers.RData") # LOAD DATA # - this will retrieve the data from the URL load(url.river_data) # INSPECT summary(lines.rivers) lines.rivers@data %>% glimpse() levels(lines.rivers$FEATURE) table(lines.rivers$FEATURE) #============================================== # REMOVE MISC FEATURES # - there are some features in the data that we # want to remove #============================================== lines.rivers <- subset(lines.rivers, !(FEATURE %in% c("Shoreline" ,"Shoreline Intermittent" ,"Null" ,"Closure Line" ,"Apparent Limit" ))) # RE-INSPECT table(lines.rivers$FEATURE) #============== # REMOVE STATES #============== #------------------------------- # IDENTIFY STATES # - we need to find out # which states are in the data #------------------------------- table(lines.rivers$STATE) #--------------------------------------------------------- # REMOVE STATES # - remove Alaska, Hawaii, Puerto Rico, and Virgin Islands # - these are hard to plot in a confined window, so # we'll remove them for convenience #--------------------------------------------------------- lines.rivers <- subset(lines.rivers, !(STATE %in% c('AK','HI','PR','VI'))) # RE-INSPECT table(lines.rivers$STATE) #============================================ # FORTIFY # - fortify will convert the # 'SpatialLinesDataFrame' to a proper # data frame that we can use with ggplot2 #============================================ df.usa_rivers <- fortify(lines.rivers) #============ # GET USA MAP #============ map.usa_country <- map_data("usa") map.usa_states <- map_data("state") #======= # PLOT #======= ggplot() + geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848") + geom_path(data = df.usa_rivers, aes(x = long, y = lat, group = group), color = "#8ca7c0", size = .08) + coord_map(projection = "albers", lat0 = 30, lat1 = 40, xlim = c(-121,-73), ylim = c(25,51)) + labs(title = "Rivers and waterways of the United States") + annotate("text", label = "sharpsightlabs.com", family = "Gill Sans", color = "#A1A1A1" , x = -89, y = 26.5, size = 5) + theme(panel.background = element_rect(fill = "#292929") ,plot.background = element_rect(fill = "#292929") ,panel.grid = element_blank() ,axis.title = element_blank() ,axis.text = element_blank() ,axis.ticks = element_blank() ,text = element_text(family = "Gill Sans", color = "#A1A1A1") ,plot.title = element_text(size = 34) )
Use this as practice
If you’ve learned the basics of data visualization in R (namely, ggplot2) and you’re interested in geospatial visualization, use this as a small, narrowly-defined exercize to practice some intermediate skills.
There are at least three things that you can learn and practice with this visualization:
- Learn about color: Part of what makes this visualization compelling are the colors. Notice that in the area surrounding the US, we’re not using pure black, but a dark grey. For the title, we’re not using white, but a medium grey. Also, notice that for the rivers, we’re not using “blue” but a very specific hexadecimal color. These are all deliberate choices. As an exercise, I highly recommend modifying the colors. Play around a bit and see how changing the colors changes the “feel” of the visualization.
- Learn to build visualizations in layers: I’ve emphasized this several times recently, but layering is an important principle of data visualization. Notice that we’re layering the river data over the USA country map. As an exercise, you could also layer in the state boundaries between the country map and the rivers. To do this, you can use
map_data() . - Learn about ‘Spatial’ data: R has several classes for dealing with ‘geospatial’ data, such as ‘
SpatialLines ‘, ‘SpatialPoints ‘, and others. Spatial data is a whole different animal, so you’ll have to learn its structure. This example will give you a little experience dealing with it.
Iterate to get the details right
What really makes this visualization work is the fine little details. In particular, the size of the lines and the colors.
The reality is that creating good-looking visualizations requires attention to the little details.
To get the details right for a plot like this, I recommend that you build the visualization iteratively.
Start with a simple version of just the map of the US.
ggplot() + geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848")
Next, layer on the rivers:
ggplot() + geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848") + geom_path(data = df.usa_rivers, aes(x = long, y = lat, group = group))
Make no mistake: this doesn’t look good. But, in the early stages, that’s not the goal. You just want to make sure that the data are structurally right. You want something simple that you can build on.
Ok, next, play with the river colors.
Start with a simple ‘
ggplot() + geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848") + geom_path(data = df.usa_rivers, aes(x = long, y = lat, group = group), color = "blue")
Let’s be honest. This still does not look good.
But it’s closer.
From here, you can play with the colors some more. Select a new color (I recommend using a color picker), and modify the
ggplot() + geom_polygon(data = map.usa_country, aes(x = long, y = lat, group = group), fill = "#484848") + geom_path(data = df.usa_rivers, aes(x = long, y = lat, group = group), color = "#99ccff")
Not perfect, but better still.
From here, you can continue to iterate, add more details, and get them all “perfect”:
- The exact color (this takes lots of trial-and-error, and a bit of good taste)
- The
line size forgeom_path() - The title and text annotations
- Modify the projection, and change it to the “albers” projection with
coord_map() - The other
theme() details like background color, removing extraneous elements (like the axis labels) etc
Once again: getting this just right takes lots of iteration. Try it yourself and build this visualization from the bottom up.
Learn ggplot2 (because ggplot2 makes this easy)
In this post, we’ve used
That said, if you’re interested in data science and data visualization, learn
Longtime readers at Sharp Sight will know my thoughts on this, but if you’re a new reader this is important.
Not interested in visualization per se?
Do you want to focus on machine learning instead?
Fair enough.
If you want to learn machine learning, you still need to be able to analyze and explore your data.
Once again, the best tool for exploring and analyzing your data is
Sign up to master data visualization
Do you want to get a job as a data scientist?
You need to master data visualization.
We’ll show you how.
Sign up now, and we’ll show you step-by-step how to learn (and master) data visualization in R.
SIGN UP NOW
The post How to map geospatial data: USA rivers appeared first on SHARP SIGHT LABS.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.