Site icon R-bloggers

Getting rasters into shape from R

[This article was first published on John Baumgartner's Research » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today I needed to convert a raster to a polygon shapefile for further processing and plotting in R. I like to keep my code together so I can easily keep track of what I’ve done, so it made sense to do the conversion in R as well. The fantastic raster package has a function that can do this (rasterToPolygons), but it seems to take a very, very long time and chew up all system resources (at least on my not particularly souped up X200 laptop).

We can cut down the time taken for conversion by calling a GDAL utility,  gdal_polygonize.py, directly from R using system2(). GDAL needs to be installed on your system, but you’ll probably want it installed anyway if you’re planning to talk to spatial data (in fact, you might find you actually already have it installed, as it is required for a bunch of other GIS software packages). Together with it’s cousin OGR, GDAL allows reading and writing of a wide range of raster and vector geospatial data formats. Once installed, you have at your disposal a bunch of GDAL Utilities that can be run from the terminal, including gdal_polygonize.py. For polygonize and a couple of others, you’ll also need Python installed (again, you may already have it installed, so check first!). Finally, it helps if the path to the gdal_polygonize.py exists in the global PATH variable. To check if this is the case, run the following in R:

Sys.which("gdal_polygonize.py")

If this returns an empty string, it looks like gdal_polygonize.py either doesn’t exist on your system, or the path to its containing directoy is missing from the PATH variable. In the latter case, you can either use the pypath argument accepted by the function below to specify the path, or modify your PATH variable. The PATH variable can be modified by following these instructions for Windows and Mac. I suspect Linux users should typically not run into this problem, as the GDAL executables are usually installed to paths that already exist in PATH (e.g. /usr/bin).

Anywho… let’s compare rasterToPolygons with gdal_polygonize. The function included below (gdal_polygonizeR) borrows from code provided in a post by Lyndon Estes on R-sig-geo. Thanks mate!

[EDIT: I’ve given this post a considerable overhaul, including correction of the source code such that it should work across platforms. Please feel free to leave me a comment if you have any problems or suggestions.]

First, we’ll import a raster data set

Here’s one I prepared earlier. It’s the result of a conversion of a polygon shapefile of country boundaries (from Natural Earth, a great source of public domain, physical/cultural spatial data) to a raster data set.

download.file('http://dl.dropbox.com/u/1058849/blog/NEcountries.asc.zip',
              destfile={f <- tempfile()}, quiet=TRUE, cacheOK=FALSE)
unzip(f, exdir={d <- tempdir()})
library(rasterVis)
r <- raster(file.path(d, 'NEcountries.asc'), crs=CRS('+proj=longlat'))
levelplot(r, margin=FALSE, col.regions=rainbow)

The raster object resulting from using rasterToPolygons. Click to see it in all it’s pixelated glory.

Using rasterToPolygons in the raster package

p <- rasterToPolygons(r, dissolve=TRUE)

…time passes…

Ok, ouch. I decided to kill the process after 6 hours.

Using gdal_polygonize.py

Note that the following takes an argument outshape, which specifies the path where the converted polygon shapefile will reside. Default is NULL, which saves the shapefile to a temporary file. Either way, the function returns a SpatialPolygonsDataFrame representation of the shapefile, unless readpoly is FALSE, in which case the function returns NULL.

## Define the function
gdal_polygonizeR <- function(x, outshape=NULL, gdalformat = 'ESRI Shapefile', 
                             pypath=NULL, readpoly=TRUE, quiet=TRUE) {
  if (isTRUE(readpoly)) require(rgdal)
  if (is.null(pypath)) {
    pypath <- Sys.which('gdal_polygonize.py')
  }
  if (!file.exists(pypath)) stop("Can't find gdal_polygonize.py on your system.") 
  owd <- getwd()
  on.exit(setwd(owd))
  setwd(dirname(pypath))
  if (!is.null(outshape)) {
    outshape <- sub('\\.shp$', '', outshape)
    f.exists <- file.exists(paste(outshape, c('shp', 'shx', 'dbf'), sep='.'))
    if (any(f.exists)) 
      stop(sprintf('File already exists: %s', 
                   toString(paste(outshape, c('shp', 'shx', 'dbf'), 
                                  sep='.')[f.exists])), call.=FALSE)
  } else outshape <- tempfile()
  if (is(x, 'Raster')) {
    require(raster)
    writeRaster(x, {f <- tempfile(fileext='.asc')})
    rastpath <- normalizePath(f)
  } else if (is.character(x)) {
    rastpath <- normalizePath(x)
  } else stop('x must be a file path (character string), or a Raster object.')
  system2('python', args=(sprintf('"%1$s" "%2$s" -f "%3$s" "%4$s.shp"', 
                                  pypath, rastpath, gdalformat, outshape)))
  if (isTRUE(readpoly)) {
    shp <- readOGR(dirname(outshape), layer = basename(outshape), verbose=!quiet)
    return(shp) 
  }
  return(NULL)
}

Time to polygonizeR the raster!

Firstly, using a raster object…

system.time(p <- gdal_polygonizeR(r))

Creating output /tmp/RtmpdnHOc7/filec7f2acf5d36.shp.
0...10...20...30...40...50...60...70...80...90...100 - done.
   user  system elapsed 
 19.833   0.336  20.464 

And now referring directly to the raster file, and disabling the importing of the resulting shapefile by specifying readpoly=FALSE.

system.time(
  gdal_polygonizeR(file.path(d, 'NEcountries.asc'), readpoly=F)
)

Creating output /tmp/RtmpdnHOc7/filec7f4ae4d60d.shp.
0...10...20...30...40...50...60...70...80...90...100 - done.
   user  system elapsed 
  2.864   0.124   3.023 

Just over 20 seconds if we begin with a raster object, and only 3 seconds if we already have the raster saved as a file on disk. That’s around 1/7000 of the time spent unsuccessfully attempting the same feat with rasterToPolygons.

spplot(p, col.regions=rainbow(200))

The SpatialPolygonsDataFrame resulting from gdal_polygonizeR.

Don’t get me wrong… I’m a big fan of raster as well as Robert’s other packages. This is just one place where it is not particularly efficient. Also note that I’m dissolving the polygons when using rasterToPolygons, and as this relies on rgeos, the problem might lie there.


Filed under: R Tagged: data, function, GIS, maps, open source, polygon, R, raster, rstats, spatial data

To leave a comment for the author, please follow the link and comment on their blog: John Baumgartner's Research » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.