[This article was first published on ProbaPerception, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Have you ever played the board game “Guess who?”. For those who have not experienced childhood (because it might be the only reason to ignore this board game), this is a game consisting in trying to guess who the opponent player is thinking of among a list of characters – we will call the one he chooses the “chosen character”. These characters have several characteristics such as gender, having brown hair or wearing glasses. To find out, you are only allowed to ask questions expecting a yes-no answer.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This game has been expanded to a further complexity through the funny and impressive website: Akinator. The software tries to guess who we are thinking of by asking yes-no questions.
With a friend/colleague/classmate of mine, Pierre Cordier, we wondered how it worked and what would be the fastest way to find the answer. In particular, is it better to ask about a very balanced characteristic, such as “Is it a female?”, or a very unbalanced characteristic such as “Is it an alien?”. The first question is very likely to remove fifty percent of the population, while the second question is very likely to eliminate a very small part of the population; on the other hand, if we are lucky, we can have the answer “Yes, it is an alien” and, in this case, we will directly know who is the “chosen character”. In other words, the mantra of the first strategy is “No risk, no reward”, while the second strategy’s one is “High risk, high reward”.
To answer this mysterious question, we had a neural network approach. More especially, we had a bipartite neural network approach, also known as institutional neural network. We use as a reference the book by David Easley and Jon Kleinberg, Networks, crowds and markets: reasoning about a highly connected world, introduced earlier in Spatial segregation in cities – An explanation by a neural network model. The idea is that every character belongs, or not, to different institutions/groups such as “wearing glasses”. In term of mathematical approach, we can represent this neural network as a matrix where every column is an institution and each row is an individual. If the individual i belongs to the institution j, then the intersection of the row i and the column j would be 1, otherwise it would be 0.
We have first randomly generated this matrix, and then computed two strategies. The first one checks for the characteristic with the most balanced distribution of “yes” and “no”. The second strategy finds the characteristic with the nearest distribution to 90% of “yes” for 10% of “no”, or 10% of “yes” for 90% of “no”.
We have done this simulation many times (considering 100 individuals for 10 institutions) to have an estimation of the number of questions needed to find the “chosen character”. We have made two observations. In average, the fifty/fifty strategy is faster than the ninety/ten strategy. Besides, the number of questions needed varies less for the first one. Therefore, the first strategy is more efficient in average, and more reliable (as its variance is lower).
After 50 simulations, the probability distributions of the number of questions give a good insight of the spread between the two strategies.
Probability distributions |
The code (R):
# Who s who
# install.packages(“shape”)
#functions
# creation of matrix
createMatrix = function(myNumberPeople, myNumberInstitution){
mat = matrix(0, nrow = myNumberPeople, ncol = myNumberInstitution)
for (i in 1:(myNumberPeople * myNumberInstitution)){
mat[i]= rbinom(1, 1, prob = 0.5)
}
return(mat)
}
#calculation of proportion
proportion = function(myMatrix){
result = matrix(0, 1,length(myMatrix[1,]))
for(i in 1:length(result)){
result[i] = sum(myMatrix[,i])
}
return(result/nrow(myMatrix))
}
#strategy based on proportion
closest = function(myMatrix, a, p){
vect = proportion(myMatrix)
for(i in 1:length(vect)){
vect[i] = min(abs(p – vect[i]),abs((1-p) – vect[i]))
}
return(a[which(vect[a] == min(vect[a]))[1]])
}
belong = function(myMatrix,myGuy,a , p){
return(myGuy[closest(myMatrix, a, p)]==1)
}
elimination = function(myMatrix,myGuy, a, p){
myMatrix = myMatrix[(belong(myMatrix,myGuy, a, p)==myMatrix[,closest(myMatrix, a, p)]),]
return(myMatrix)
}
strategy = function(myMatrix,myGuy, p){
k = 0
a = 1:numberInstitution
while(length(a) > 0 & if(is.vector(myMatrix) != 1){length(myMatrix[!duplicated(myMatrix),])/numberInstitution != 1}else{FALSE}){
b = a[-which(a==closest(myMatrix, a, p))]
myMatrix = elimination(as.matrix(myMatrix), myGuy, a, p)
a= b
k = k+1
# print(dim(myMatrix))
}
res = list(NumberSteps=k, Candidates=myMatrix)
return(res)
}
#random strategy
createOrder = function(){
return(sample(1:numberInstitution, numberInstitution))
}
belongRandom = function(myMatrix,myGuy, myVariable){
return(myGuy[myVariable]==1)
}
eliminationRandom = function(myMatrix,myGuy,a){
myMatrix = myMatrix[(belongRandom(myMatrix,myGuy, a[1])==myMatrix[,a[1]]),]
return(myMatrix)
}
strategyRandom = function(myMatrix,myGuy){
k = 0
a = createOrder()
while(length(a) > 0 & (if(is.vector(myMatrix) != 1){length(myMatrix[!duplicated(myMatrix),])/numberInstitution != 1}else{FALSE})){
myMatrix = eliminationRandom(as.matrix(myMatrix), myGuy,a)
a = a[-1]
k = k+1
}
res = list(NumberSteps=k, Candidates=myMatrix)
return(res)
}
application = function(myMatrix, myGuy, myStrategy){
# myStrategy is in {50, random, 90}
if(myStrategy == “random”){return(strategyRandom(myMatrix, myGuy))}
}
# initialization
numberPeople = 100
numberInstitution = 10
memory = list(“fifty” = c(), “random” = c(), “ninety” = c())
for(i in 1:50){
mat = createMatrix(numberPeople, numberInstitution)
copyMat = mat
guy = copyMat[sample(1:length(copyMat[,1]),1),]
# copy2 = mat
# copy3 = mat
# guy = mat[sample(1:numberPeople,1),]
# memory$fifty = c(memory$fifty, application(copy1, guy, “50”)$NumberSteps)
# memory$random = c(memory$random, application(copy2, guy, “random”)$NumberSteps)
memory$ninety = c(memory$ninety, strategy(copyMat, guy, 0.1)$NumberSteps)
memory$random = c(memory$random, application(copyMat, guy, “random”)$NumberSteps)
memory$fifty = c(memory$fifty, strategy(copyMat, guy, 0.5)$NumberSteps)
}
#plotting
library(shape)
png(filename=”~/Chr10/post9_figure1.png”, bg=”white”)
col <- shadepalette(9, “cyan”, “blue”)
plot(c(),ylim=c(4.2,10), xlim = c(3, 48), xlab =”Simulationss”, ylab =”Number of questions”)
polygon(c(1:50,50:1), c(memory$fifty,rev(memory$ninety)), col=”#EFDECD”)
a <- c(mean(memory$fifty)-var(memory$fifty),mean(memory$fifty)+var(memory$fifty))
b <- c(mean(memory$ninety)-var(memory$ninety),mean(memory$ninety)+var(memory$ninety))
lines(memory$ninety, type = ‘l’, col=col[3], lwd = 2)
abline(h=mean(memory$fifty), col=col[7], lwd =2)
abline(h=mean(memory$ninety), col=col[3], lwd = 2)
lines(memory$fifty, col = col[7], lwd = 2)
title(“Difference between the two strategies”)
dev.off()
png(filename=”~/Chr10/post9_figure2.png”, bg=”white”)
plot(table(memory$ninety)/50, ylim=c(0,0.7), type=”p”,col = col[3], lwd = 4,
main=”Distribution of the number of questions
for both strategies”,
xlab=”Number of questions”,
ylab=”Probability”)
lines(table(memory$fifty)/50, type=”p”,col = col[7], lwd = 4)
legend(x=4, y= 0.7,legend =c(“50/50”, “90/10”), col=c(col[7],col[3]), pch= 1, pt.lwd = 4)
dev.off()
To leave a comment for the author, please follow the link and comment on their blog: ProbaPerception.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.