[This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here are the R codes of the second R lab organised by Serena Arima in supplement of my lectures (now completed!). This morning I covered ABC model choice and the following example is the benchmark used in the course (and in the paper) about the impact of summary statistics. (Warning! It takes a while to run…)
n.iter=10000 n=c(10,100,1000) n.sims=100 prob.m1=matrix(0,nrow=n.sims,ncol=length(n)) prob.m2=prob.m1 ### Simulation from Normal model for(sims in 1:n.sims){ ## True data generation from the Normal distribution and summary statistics for(i in 1:length(n)){ y.true=rnorm(n[i],0,1) stat.true=c(mean(y.true),median(y.true),var(y.true)) ## ABC algorithm # Sample from the prior mu=rnorm(n.iter,0,2) dist.m1=rep(NA,n.iter) dist.m2=rep(NA,n.iter) for(j in 1:n.iter){ # Data generation under both models # Normal model y.m1=rnorm(n[i],mu[j]) stat.m1=c(mean(y.m1),median(y.m1),var(y.m1)) # computing the distance dist.m1[j]=sum((stat.m1-stat.true)^2) # Laplace model y.m2=mu[j]+sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2)) stat.m2=c(mean(y.m2),median(y.m2),var(y.m2)) # computing the distance dist.m2[j]=sum((stat.m2-stat.true)^2) } # select epsilon as 1% distance quantile epsilon=quantile(c(dist.m1,dist.m2),prob=0.01) # compute the posterior probability that data come from # the model prob.m1[sims,i]=sum(dist.m1<epsilon)/(2*n.iter*0.01) }} ### Simulation from the Laplace model for(sims in 1:n.sims){ ## True data generation from the Laplace distribution and summary statistics for(i in 1:length(n)){ y.true=sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2)) stat.true=c(mean(y.true),median(y.true),var(y.true)) ## ABC algorithm # Sample from the prior mu=rnorm(n.iter,0,2) dist.m1=rep(NA,n.iter) dist.m2=rep(NA,n.iter) for(j in 1:n.iter){ # Data generation under both models # Normal model y.m1=rnorm(n[i],mu[j]) stat.m1=c(mean(y.m1),median(y.m1),var(y.m1)) # computing the distance dist.m1[j]=sum((stat.m1-stat.true)^2) # Laplace model y.m2=mu[j]+sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2)) stat.m2=c(mean(y.m2),median(y.m2),var(y.m2)) # computing the distance dist.m2[j]=sum((stat.m2-stat.true)^2) } # select epsilon as 1% distance quantile epsilon=quantile(c(dist.m1,dist.m2),prob=0.01) # compute the posterior probability that data come from # the model prob.m2[sims,i]=sum(dist.m2<epsilon)/(2*n.iter*0.01) } } # Visualize the results y.true=sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2)) stat.true=c(mean(y.true),median(y.true),var(y.true)) ## ABC algorithm # Sample from the prior mu=rnorm(n.iter,0,2) dist.m1=rep(NA,n.iter) dist.m2=rep(NA,n.iter) for(j in 1:n.iter){ # Data generation under both models # Normal model y.m1=rnorm(n[i],mu[j]) stat.m1=c(mean(y.m1),median(y.m1),var(y.m1)) # computing the distance dist.m1[j]=sum((stat.m1-stat.true)^2) # Laplace model y.m2=mu[j]+sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2)) stat.m2=c(mean(y.m2),median(y.m2),var(y.m2)) # computing the distance dist.m2[j]=sum((stat.m2-stat.true)^2) } # select epsilon as 1% distance quantile epsilon=quantile(c(dist.m1,dist.m2),prob=0.01) # compute the posterior probability that data come from # the model prob.m2[sims,i]=sum(dist.m2<epsilon)/(2*n.iter*0.01) } } # Visualize the results y.true=sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2)) stat.true=c(mean(y.true),median(y.true),var(y.true)) ## ABC algorithm # Sample from the prior mu=rnorm(n.iter,0,2) dist.m1=rep(NA,n.iter) dist.m2=rep(NA,n.iter) for(j in 1:n.iter){ # Data generation under both models # Normal model y.m1=rnorm(n[i],mu[j]) stat.m1=c(mean(y.m1),median(y.m1),var(y.m1)) # computing the distance dist.m1[j]=sum((stat.m1-stat.true)^2) # Laplace model y.m2=mu[j]+sample(c(-1,1),n[i],rep=TRUE)*rexp(n[i],rate=sqrt(2)) stat.m2=c(mean(y.m2),median(y.m2),var(y.m2)) # computing the distance dist.m2[j]=sum((stat.m2-stat.true)^2) } # select epsilon as 1% distance quantile epsilon=quantile(c(dist.m1,dist.m2),prob=0.01) # compute the posterior probability that data come from # the model prob.m2[sims,i]=sum(dist.m2<epsilon)/(2*n.iter*0.01) }} # Visualize the results boxplot(data.frame(prob.m1[,1],prob.m2[,1]), names=c("M1","M2"),main="N=10") boxplot(data.frame(prob.m1[,2],prob.m2[,2]), names=c("M1","M2"),main="N=100") boxplot(data.frame(prob.m1[,3],prob.m2[,3]), names=c("M1","M2"),main="N=1000")
Once again, I had a terrific time teaching this “ABC in Roma” course, and not only for the immense side benefit of enjoy Roma in a particularly pleasant weather (for late February).
Filed under: R, Statistics, University life Tagged: ABC, La Sapienza, PhD course, R, Roma
To leave a comment for the author, please follow the link and comment on their blog: Xi'an's Og » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.