A first attempt at an individual-based model in R
[This article was first published on me nugget, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have been curious for a while as to how R might be used for the construction of an individually-based model (IBM), or agent-based model (ABM). In particular, what R objects lend themselves best to storing information on individuals, and allow for new individuals to be added or subtracted throughout the simulation?
In this first attempt, I have ended up opting for a multi-level list, where elements represent individuals, and sub-levels contain attribute information. The main reason is speed – In a previous post I highlighted the fact that lists are not penalized in the same way as a data.frame when the object in “grown” or concatenated with additional information (due to time spent reallocating memory).
The example models a population with a given birth rate, death rate, and carrying capacity. Attributes of individuals that are recorded in the list include their age, whether they are alive or dead, and their color (blue or red). The attribute of color is passed from parent to offspring, and their is a tendency for one phenotype to dominate over time. The idea comes from a simple model of genetic drift that can be explored with the IBM programming platform NetLogo.
So far so good, but there are still some speed issues associated with a list that continues to grow. Part of the speed issue is due to the calculation of summary statistics during each iteration (using e.g. sapply). A cropping of the list to retain only alive individuals, has dramatic improvements on speed as well, so there appears to be a cost associated with growing the list object itself. The difference might be that my previous example was filling a empty list, where the number of elements was predefined.
I would be interested if anyone has any thoughts on this or, more generally, on the construction of IBMs in R. While R is probably not the best programming language suitable to IBMs, it would be interesting to know if more examples exist.
Example:
b <- 0.14 # probability of birth d <- 0.08 # probability of death K <- 100 # carrying capacity N0 <- 50 # starting number of individuals t <- 500 # time of simulation #create starting individual w attributes ("alive", "age", "color") set.seed(1) ind <- vector(mode="list", N0) for(i in seq(ind)){ ind[[i]]$alive <- 1 ind[[i]]$age <- 0 ind[[i]]$color <- c("blue", "red")[round(runif(1)+1)] } #make empty vectors to record population statistics time <- seq(t+1) pop <- NaN * time # population size pop[1] <- N0 frac.blue <- NaN * time # fraction of population that is blue cols <- sapply(ind, function(x) x$color) frac.blue[1] <- sum(cols == "blue") / length(cols) med.age <- NaN * time ages <- sapply(ind, function(x) x$age) med.age[1] <- median(ages) #simulation save.alive.only <- TRUE # optional cropping of "ind" to include alive individuals only t1 <- Sys.time() for(i in seq(t)){ # loop for each time increment is.alive <- which(sapply(ind, function(x) x$alive) == 1) for(j in is.alive){ #loop for each alive individual birth <- runif(1) <= (b * (1 - length(is.alive)/K)) # calculate a birth probability for each individual that is alive if(birth){ len.ind <- length(ind) ind[[len.ind+1]] <- list(alive=1, age=0, color=ind[[j]]$color) # create offspring, inherits color of parent } death <- runif(1) <= d # calculate a death probability for each individual if(death){ ind[[j]]$alive <- 0 # if death, reset alive = 0 } else { #else, advance age + 1 ind[[j]]$age <- ind[[j]]$age + 1 # advance age of parent } } #optional cropping of list "ind" if(save.alive.only){ is.dead <- which(sapply(ind, function(x) x$alive) == 0) if(length(is.dead) > 0) ind <- ind[-is.dead] } #Population stats is.alive <- which(sapply(ind, function(x) x$alive) == 1) pop[i+1] <- length(is.alive) cols <- sapply(ind, function(x) x$color) frac.blue[i+1] <- sum(cols[is.alive] == "blue") / length(is.alive) ages <- sapply(ind, function(x) x$age) med.age[i+1] <- median(ages[is.alive]) print(paste(i, "of", t, "finished", "[", round(1/t*100), "%]")) } t2 <- Sys.time() dt <- t2-t1 dt #plot populations png("pops_vs_time.png", width=6, height=4, units="in", res=400) par(mar=c(4,4,1,1)) pop.blue <- pop * frac.blue pop.red <- pop * (1-frac.blue) ylim=range(c(pop.blue, pop.red)) plot(time, pop.blue, t="l", lwd=2, col=4, ylim=ylim, ylab="Population size") lines(time, pop.red, lwd=2, col=2) legend("topleft", legend=c("blue pop.", "red pop."), lwd=2, col=c(4,2), bty="n") dev.off() #plot median age png("med_age_vs_time.png", width=6, height=4, units="in", res=400) par(mar=c(4,4,1,1)) plot(time, med.age, t="l", lwd=2, ylab="Median age") dev.off()
To leave a comment for the author, please follow the link and comment on their blog: me nugget.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.