More pi plus 1 (or plus 0.01) day fun
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Since I just didn’t get enough this morning, I spent some more time fooling around with estimating pi. Since I was basically counting the number of random x,y pairs inside a quarter circle and computing a sample average for more and more iterations I wondered how sensitive my results were to small or large sample sizes within each iteration. My original code plotted a thousand points each iteration and computed 5000 iterations, probably more than necessary. What if I only plotted 10 points each time? 100? How much faster or slower would my estimate converge on the answer for larger sample sizes? So I took a look:
![]() |
From pi day |
For less than 500 iterations, the lower sample sizes (10 and 100) jump around quite a bit before settling roughly on the answer. A sample size of 1000 gets within 0.001 very rapidly. I also looked at the progression of each estimate. I subtracted pi from each iteration’s estimate and looked at the different between period and
.
![]() |
From pi day |
Just like the picture of overall estimation, the estimate built with a sample size of 10 is much less stable even after hundreds of iterations (where each successive sample has a smaller and smaller impact on the sample average). Code is below. You don’t actually need ggplot2
to compute any of this, just to graph things (you could pretty easily graph this in base R as well).
#Same idea as yesterday, but in one function with two arguments | |
pi.est<- function(iter,n) { | |
pi.ratio<- function(n) { | |
x<- runif(n, min=-1,max=1) | |
y<- runif(n, min=-1,max=1) | |
return(4*sum(x^2 + y^2 <= 1.0)/n) | |
} | |
sum.pi<-stor.rat<-numeric(0) | |
for (i in 1:iter) { | |
stor.rat[i]<- pi.ratio(n) | |
sum.pi[i]<- sum(stor.rat[1:i])/i | |
} | |
return(sum.pi) | |
} | |
#Takes the function pi.est() and runs it for 10,100, and 1000 within group random | |
#variables so we can compare convergence | |
mapply.test<-mapply(pi.est,2000,c(10,100,1000)) | |
colnames(mapply.test)<-c("10","100","1000") | |
convergence<-as.data.frame(mapply.test) | |
convergence$iters<- 1:2000 | |
library(ggplot2) | |
gg.converge<-melt(convergence,id.vars="iters") | |
colnames(gg.converge)<-c("iters","RVs","Estimate") | |
conv.plot<-ggplot(data=gg.converge,aes(x=iters,y=Estimate)) | |
conv.plot + geom_line(aes(colour=RVs)) + opts(title="Estimate of pi with different \n within group sample sizes") | |
#How much does the estimate change over iterations? | |
conv.test<-diff(abs(mapply.test - pi)) | |
diff.conv<-as.data.frame(conv.test) | |
diff.conv$iters<- 1:1999 | |
gg.diff<- melt(diff.conv,id.vars="iters") | |
colnames(gg.diff)<- c("iters","RVs","Diff") | |
diff.plot<- ggplot(data=gg.diff,aes(x=iters,y=Diff)) | |
diff.plot + geom_line(aes(colour=RVs))+ opts(title="Rate of convergence with different \n within group sample sizes") | |
####Animated breakdown of the process above ##### | |
#Nearly empty plot. The only interesting bit is the tacked on y-axis for the running average sparkline | |
base.plot<- function() { | |
plot(-2:2, -2:2, type="n", asp=1, ann=FALSE, axes='FALSE', frame.plot=TRUE) | |
title(main="A peek inside the \n estimation of pi") | |
axis(side=2,at=c(seq(0.25,2,by=0.25)),labels=c(as.character(seq(0.25,2,by=0.25)+2)),par(cex=0.8,las=1)) | |
lines(structure(c(0, 0, 1, 1, 0, 0, 1, 1, 0, 0), .Dim = c(5L, 2L))) | |
symbols (0, 0, circles=1, inches=FALSE, add=TRUE) | |
} | |
#Same as the function above with 0,1 as bounds. I return three objects because I want to get at the | |
#x,y coords, the location and the ratio all at once (outside the for() loop). | |
pi.ratio<- function(n) { | |
x<- runif(n, min=0,max=1) | |
y<- runif(n, min=0,max=1) | |
pi.pts<- cbind(x,y) | |
return(list(est=4*sum(x*x + y*y <= 1.0)/n, ind=x*x + y*y <= 1.0, loc=pi.pts)) | |
} | |
#Nothing too dramatic, though the ",drop=FALSE" argument prevents the red and blue matrices from | |
#accidentally becoming vectors. | |
pi.est<- function(iter,n) { | |
sum.pi<- stor.rat<- numeric(0) | |
x.len<-seq(-2,2,length.out=iter) | |
for (i in 1:iter) { | |
sample.out<- pi.ratio(n) | |
red<- as.matrix(sample.out$loc[sample.out$ind, , drop=FALSE]) | |
blue<- as.matrix(sample.out$loc[!sample.out$ind, , drop=FALSE]) | |
stor.rat[i]<- sample.out$est | |
sum.pi[i]<- sum(stor.rat[1:i])/i | |
base.plot() | |
points(red,col=2,pch=20,cex=0.8) | |
points(blue,col=4,pch=20,cex=0.8) | |
text(-1.5,0.8,labels=sprintf("%f",sum.pi[i]),col=3) | |
lines(x.len[1:i],sum.pi[1:i]-2,col=3) | |
Sys.sleep(0.02); | |
} | |
} | |
library(animation) | |
saveMovie(pi.est(50,20),movie.name = "pianim.gif",outdir = getwd(), width = 600, height = 600); |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.