a Simpson paradox of sorts
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The riddle from The Riddler this week is about finding an undirected graph with N nodes and no isolated node such that the number of nodes with more connections than the average of their neighbours is maximal. A representation of a connected graph is through a matrix X of zeros and ones, on which one can spot the nodes satisfying the above condition as the positive entries of the vector (X1)^2-(X^21), if 1 denotes the vector of ones. I thus wrote an R code aiming at optimising this target
targe <- function(F){ sum(F%*%F%*%rep(1,N)/(F%*%rep(1,N))^2<1)}
by mere simulated annealing:
rate <- function(N){ # generate matrix F # 1. no single F=matrix(0,N,N) F[sample(2:N,1),1]=1 F[1,]=F[,1] for (i in 2:(N-1)){ if (sum(F[,i])==0) F[sample((i+1):N,1),i]=1 F[i,]=F[,i]} if (sum(F[,N])==0) F[sample(1:(N-1),1),N]=1 F[N,]=F[,N] # 2. more connections F[lower.tri(F)]=F[lower.tri(F)]+ sample(0:1,N*(N-1)/2,rep=TRUE,prob=c(N,1)) F[F>1]=1 F[upper.tri(F)]=t(F)[upper.tri(t(F))] #simulated annealing T=1e4 temp=N targo=targe(F) for (t in 1:T){ #1. local proposal nod=sample(1:N,2) prop=F prop[nod[1],nod[2]]=prop[nod[2],nod[1]]= 1-prop[nod[1],nod[2]] while (min(prop%*%rep(1,N))==0){ nod=sample(1:N,2) prop=F prop[nod[1],nod[2]]=prop[nod[2],nod[1]]= 1-prop[nod[1],nod[2]]} target=targe(prop) if (log(runif(1))*temp<target-targo){ F=prop;targo=target} #2. global proposal prop=F prop[lower.tri(prop)]=F[lower.tri(prop)]+ sample(c(0,1),N*(N-1)/2,rep=TRUE,prob=c(N,1)) prop[prop>1]=1 prop[upper.tri(prop)]=t(prop)[upper.tri(t(prop))] target=targe(prop) if (log(runif(1))*temp<target-targo){ F=prop;targo=target} temp=temp*.999 } return(F)}
This code returns quite consistently (modulo the simulated annealing uncertainty, which grows with N) the answer N-2 as the number of entries above average! Which is rather surprising in a Simpson-like manner since all entries but two are above average. (Incidentally, I found out that Edward Simpson recently wrote a paper in Significance about the Simpson-Yule paradox and him being a member of the Bletchley Park Enigma team. I must have missed out the connection with the Simpson paradox when reading the paper in the first place…)
Filed under: Books, Kids, pictures, R Tagged: Bletchley Park, Edward Simpson, Enigma code machine, graph, mathematical puzzle, Significance, Simpson's paradox, simulated annealing, The Riddler, Yule
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.