Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have been forever annoyed at how long it takes to plot data on a large shapefile. And this is a domain where doesn’t matter if you’re working with MapInfo or R. Just zooming the figure takes ages. But a couple of days ago I bumped into an excellent blogpost by John Myles White that solved the problem in the most Daoistic way, reminiscent of Steve Jobs: Do away with the shapefile!
To have a quick overview of one’s data, there is no need for a shapefile. Just plot latitude and longitude directly into a scattergraph.
That’s it.
Here’s how it is done. For our example, let’s plot a map showing the ratio of Foreign-born vs. Australian residents, by postcode, in all of Australia. As a first step, I made a search and found a ready-to-use file with all Australian postcodes and their geographical coordinates. Next, I got some Census data from the Australian Bureau of Statistics, specifically counts of Australian-born and foreign-born residents sorted by postcode. Quick munge and plot (code below), and voilà!
require(ggplot2) setwd('/Users/myuser/R/map') poa <- read.csv('data/2011_BCP_ALL_for_AUST_short-header/2011 Census BCP All Geographies for AUST/POA/AUST/2011Census_B01_AUST_POA_short.csv', as.is=TRUE) poa$poa <- as.integer(sub('[a-zA-Z]{3}','', x=poa$region_id)) # Ratio Aus born vs elsewhere born ratio <- data.frame('poa'=poa$poa,'Freq'=poa$Birthplace_Australia_P/poa$Birthplace_Elsewhere_P) zipcodes <- read.csv('data/pc_full_lat_long.csv') zipcodes <- zipcodes[!duplicated(zipcodes$Pcode),] zipcodes <- zipcodes[!zipcodes$Lat == 0,] zipcodes <- zipcodes[!zipcodes$Long == 0,] poacoord <- data.frame('poa'=zipcodes$Pcode, 'lat'=zipcodes$Lat, 'long'=zipcodes$Long) popcount <- na.omit(merge(ratio, poacoord, all=FALSE)) popcount <- popcount[rev(order(popcount$Freq)),] # Place the small bubbles on top popcount$cat <- sapply(popcount$Freq, function(x) if (x < 6) {'> 1:6'} else {'< 1:6'}) ggplot(popcount, aes(x=long, y=lat, colour=cat)) + scale_size(range = c(1,20), name = 'Population') + geom_point() + coord_equal()
And once we get the data presentation right, we can add the shapefile and go get some tea.
# must have gpclib installed require("rgdal") # requires sp, will use proj.4 if installed require("maptools") require("ggplot2") require("plyr") gpclibPermit() # required for fortify method setwd('/Users/myuser/R/map/data/shp') amap <- readOGR(dsn=".", layer="aust_cd66states") amap@data$id = rownames(amap@data) amap.points = fortify(amap, region='id') amap.df = join(amap.points, amap@data, by='id') ggplot() + geom_polygon(data=amap.df, aes(long,lat,group=group)) + geom_path(data=amap.df, aes(long,lat,group=group), color="grey") + geom_point(data=popcount, aes(x=long, y=lat, colour=cat)) + scale_size(range = c(1,20), name = "Population") + scale_colour_discrete(name="ratio of Foreign vs\nAustralian-born") + coord_equal()
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.