Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this post I’ll outline why and how I use arrays as the main data structure in a population simulation which represents the age, sex and spatial location of tsetse flies in the landscape. An earlier post outlines the background to this tsetse population simulation.
I wanted a data structure that would make it easy and transparent for me to access elements with minimal code. I considered data frames, matrices and lists but opted for arrays. Hadley Wickham’s excellent Advanced R Data Structures section was very helpful.
Arrays
An array
in R is a multi-dimensional object, and a matrix
is a special case of an array
with just 2 dimensions. The code below shows how you can use the dim
argument to the array
function to set the number and size of dimensions. In this example I just fill the array with sequential values from 1 to 24.
array(c(1:24), dim=c(4,3,2)) ## , , 1 ## ## [,1] [,2] [,3] ## [1,] 1 5 9 ## [2,] 2 6 10 ## [3,] 3 7 11 ## [4,] 4 8 12 ## ## , , 2 ## ## [,1] [,2] [,3] ## [1,] 13 17 21 ## [2,] 14 18 22 ## [3,] 15 19 23 ## [4,] 16 20 24
Naming array dimensions
To reduce the risk of bugs caused by accessing incorrect array elements you can name dimensions and use the names to access elements. You can use the dimnames
argument to name both the elements within each dimension (e.g. F,M) and the dimension itself (e.g. sex). One way of doing this is to set dimnames
to a named list. The code below shows how I create an array with spatial, sex and age dimensions.
nY <- 4 nX <- 3 iMaxAge <- 2 sex <- c("F","M") dimnames1 <- list( y=paste0('y',1:nY), x=paste0('x',1:nX), sex=sex, age=paste0('age',1:iMaxAge)) nElements <- nY*nX*iMaxAge*length(sex) aGrid <- array(1:nElements, dim=c(nY,nX,2,iMaxAge), dimnames=dimnames1) aGrid ## , , sex = F, age = age1 ## ## x ## y x1 x2 x3 ## y1 1 5 9 ## y2 2 6 10 ## y3 3 7 11 ## y4 4 8 12 ## ## , , sex = M, age = age1 ## ## x ## y x1 x2 x3 ## y1 13 17 21 ## y2 14 18 22 ## y3 15 19 23 ## y4 16 20 24 ## ## , , sex = F, age = age2 ## ## x ## y x1 x2 x3 ## y1 25 29 33 ## y2 26 30 34 ## y3 27 31 35 ## y4 28 32 36 ## ## , , sex = M, age = age2 ## ## x ## y x1 x2 x3 ## y1 37 41 45 ## y2 38 42 46 ## y3 39 43 47 ## y4 40 44 48
The example above has just a few cells and ages. In our tsetse simulation we’re often looking at 120 age categories (days) on a 50×50 grid. Here’s a function to aid creating such arrays.
Spatial dimension trickiness
I had to be careful specifying the spatial y & x dimensions as they do not always go in the order that you might expect (and your expectations may vary depending on whether your background is more geographical or statistical !). By specifying y rather than x as the first dimension the R console displays the array in the correct orientation with x on the horizontal and y on the vertical. I started out by having x,y but kept having to transpose so decided to bite the bullet a few months in and change everything to y,x. Note that the y dimension elements start from the top which can cause other geographical issues later.
Accessing array dimensions
This array structure allows relatively transparent access to elements and summaries as shown below.
An age structure for one grid cell
aGrid['y1','x1','M',] ## age1 age2 ## 13 37
Total Males in one grid cell
sum(aGrid['y1','x1','M',]) ## [1] 50
Total population in one grid cell
sum(aGrid['y1','x1',,]) # ## [1] 76
A spatial grid for one age
aGrid[,,'M','age2'] ## x ## y x1 x2 x3 ## y1 37 41 45 ## y2 38 42 46 ## y3 39 43 47 ## y4 40 44 48
A spatial grid of total population
apply(aGrid,MARGIN=c('y','x'),sum) ## x ## y x1 x2 x3 ## y1 76 92 108 ## y2 80 96 112 ## y3 84 100 116 ## y4 88 104 120
Summed age structure for the whole population
apply(aGrid,MARGIN=c('age'),sum) ## age1 age2 ## 300 876
Summed sex ratio for thewhole population
apply(aGrid,MARGIN=c('sex'),sum) ## F M ## 444 732
This array also allows me to save all the population data for a simulation of a number of days by simply adding an extra dimension called day using the abind
function from the package of the same name. Unfortunately it loses the names of the dimensions but these can be reset. The code below adds the first day as a new dimension using the along=0
argument to abind.
library(abind) aRecord <- abind::abind(aGrid, along=0) # replace lost dimension names names(dimnames(aRecord)) <- c('day','y','x','sex','age')
Records for later days (after aGrid has changed) can be added to the first dimension using the along=1
argument :
aRecord <- abind::abind(aRecord, aGrid, along=1) # replace lost dimension names names(dimnames(aRecord)) <- c('day','y','x','sex','age')
I have now written the code to get data from the [day,y,x,sex,age] array into a helper function that I may describe in a later post.
Also in later posts I’ll show how I represent other population processes such as movement using these arrays.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.