Site icon R-bloggers

{geotargets} 0.2.0

[This article was first published on Blog on Credibly Curious, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m super stoked to announce {geotargets} version 0.2.0! The {geotargets} package extends {targets} to work with geospatial data formats.

I’d like to firstly acknowledge the strong work by Eric Scott on getting this release ready. I do want to emphasise that while post is on my website, this project is very much a team effort.

You can download {geotargets} from the R universe like so:

install.packages("geotargets", repos = c("https://njtierney.r-universe.dev", "https://cran.r-project.org"))

Why should I use geotargets and targets?

You could benefit from using targets and geotargets if you do geospatial data analysis involving rasters or shapefiles, specifically with terra or stars R packages. For example, if you are doing large downloads of rasters, then operations like cropping, reprojection, resampling, and masking.

The main benefit to using targets and geotargets is your analysis will only rerun when you change relevant parts of your data analysis. For example, you might do a lot of geospatial data processing that feeds downstream into a machine learning model to make predictions on bushfire risk. Writing with targets and geotargets means if you change the parts of the code that related to the machine learning components, then only the relevant parts with machine learning would change. This means you can save time by avoiding running computational expensive spatial data processing.

For more details on what targets is, and why we need geotargets, I would recommend reading the 0.1.0 release blog post, as well as reading the {targets} manual. The “Get started in 4 minutes” guide to targets is also excellent.

Main changes in 0.2.0?

In addition to smaller changes and improvements, there are three main additions in this release:

And very in very exciting news, we have a new hex sticker!

Thanks to Hubert Hałun for their work on getting this together, we are really happy with the new sticker!

Dynamic Branching

The main addition in this release is a demonstration of using dynamic branching using a new “target factory” function, tar_terra_tiles(). This allows you to break raster operations into tiles, and then perform these operations on the tiles and combine them together. This means we can break computationally intensive raster operations that work in pixel-wise manner over tiled subsets of the raster. This is useful when, for example, loading an entire raster into memory and doing computations on it results in out of memory errors.

As part of this addition, we created helper functions:

These help us define different extents that we can pass along as different parts of the dynamic branches. You can think of these as tools that we can use to specify how to slice, or tile up, a raster into smaller pieces that we can then do analysis on separately and combine later.

Let’s briefly unpack these, and then show how these would be used in dynamic branching. First let’s read in some example elevation data from terra and plot it:

f <- system.file("ex/elev.tif", package="terra")
r <- rast(f)
plot(r)

tile_n()

We can use tile_n(), which is the simplest of the three. It produces about n tiles in a grid.

r_tile_4 <- tile_n(r, 4)
#> creating 2 * 2 = 4 tile extents
r_tile_4
#> [[1]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.141667 49.816667 50.191667 
#> 
#> [[2]]
#>      xmin      xmax      ymin      ymax 
#>  6.141667  6.533333 49.816667 50.191667 
#> 
#> [[3]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.141667 49.441667 49.816667 
#> 
#> [[4]]
#>      xmin      xmax      ymin      ymax 
#>  6.141667  6.533333 49.441667 49.816667
# some plot helpers
rect_extent <- function(x, ...) {
  rect(x[1], x[3], x[2], x[4], ...)
}
plot_extents <- function(x, ...) {
  invisible(lapply(x, rect_extent, border = "hotpink", lwd = 2))
}
plot(r)
plot_extents(r_tile_4)

plot(r)
tile_n(r, 6) |> plot_extents()
#> creating 2 * 3 = 6 tile extents

tile_grid()

For more control, use tile_grid(), which allows specification of the number of rows and columns to split the raster into. Here we are specify that we want three columns and 1 row:

r_grid_3x1 <- tile_grid(r, ncol = 3, nrow = 1)
r_grid_3x1
#> [[1]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.008333 49.441667 50.191667 
#> 
#> [[2]]
#>      xmin      xmax      ymin      ymax 
#>  6.008333  6.266667 49.441667 50.191667 
#> 
#> [[3]]
#>      xmin      xmax      ymin      ymax 
#>  6.266667  6.533333 49.441667 50.191667
plot(r)
plot_extents(r_grid_3x1)

plot(r)
tile_grid(r, ncol = 2, nrow = 3) |> plot_extents()

tile_blocksize()

The third included helper is tile_blocksize(), which tiles by file block size. The block size is a property of raster files, and is the number of pixels (in the x and y direction) that is read into memory at a time. Tiling by multiples of block size may therefore be more efficient because only one block should need to be loaded to create each tile target. You can find the blocksize with fileBlocksize:

fileBlocksize(r)
#>      rows cols
#> [1,]   43   95

This tells us that it reads in the raster in 43×95 pixel sizes.

The tile_blocksize function is similar to tile_grid, except instead of saying how many rows and columns, we specify in units of blocksize.

If we just run tile_blocksize() on r we get the extents of the specified blocksize:

tile_blocksize(r)
#> [[1]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.533333 49.833333 50.191667 
#> 
#> [[2]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.533333 49.475000 49.833333 
#> 
#> [[3]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.533333 49.441667 49.475000

Which is the same as specifying blocksize for row and column at unit 1:

r_block_size_1x1 <- tile_blocksize(r, n_blocks_row = 1, n_blocks_col = 1)
r_block_size_1x1
#> [[1]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.533333 49.833333 50.191667 
#> 
#> [[2]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.533333 49.475000 49.833333 
#> 
#> [[3]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.533333 49.441667 49.475000
plot(r)
plot_extents(r_block_size_1x1)

Here the block size is the same size for the first two blocks, and then a much more narrow block. This is different to the two other tile methods.

Here the column block size is the full width of the raster.

So we could instead have the blocksize extent be written out to 2 blocks in a row, and 1 block size for the columns:

r_block_size_2x1 <- tile_blocksize(r, n_blocks_row = 2, n_blocks_col = 1)
r_block_size_2x1
#> [[1]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.533333 49.475000 50.191667 
#> 
#> [[2]]
#>      xmin      xmax      ymin      ymax 
#>  5.741667  6.533333 49.441667 49.475000
plot(r)
plot_extents(r_block_size_2x1)

This only works when the SpatRaster points to a file—in-memory rasters have no inherent block size.

sources(r)
#> [1] "/Users/nick/Library/R/arm64/4.4/library/terra/ex/elev.tif"
#force into memory
r2 <- r + 0
sources(r2)
#> [1] ""
#this now errors
tile_blocksize(r2)
#> Error: [aggregate] values in argument 'fact' should be > 0

For more detail on using this in targets, please see the geotargets vignette, “Dynamic branching with raster tiles”

Preserving spatRaster metadata

tar_terra_rast() gains a preserve_metadata option that when set to "zip" reads/writes targets as zip archives that include aux.json “sidecar” files sometimes written by terra (#58).

Support of stars and stars_proxy

We have created tar_stars() and tar_stars_proxy() that create stars and stars_proxy objects, respectively. These are currently experimental.

Minor changes in 0.2.0

Other changes include:

What’s next?

We have finished developing the main milestones for geotargets, but will continue actively developing it. Soon we will be submitting the package for review by rOpenSci, and subsequently submit the work to the Journal of Open Source Software (JOSS), and then submit to CRAN.

Currently, the next release will focus on adding support for:

You can see the full list of issues for more detail on what we are working on.

Thanks

We would like to thank the R Consortium for generously supporting this project, “{geotargets}: Enabling geospatial workflow management with {targets}".

We would also like to thank Michael Sumner, Anthony North, and Miles McBain for their helpful discussions throughout, as well as Will Landau for writing targets, and being incredibly responsive and helpful to the issues and questions we have asked as we wrote {geotargets}.

To leave a comment for the author, please follow the link and comment on their blog: Blog on Credibly Curious.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version