Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Alea Jacta Est
This is the very last analysis before the election. A far-right nationalist candidate, Jair Messias Bolsonaro, is leading the polls with about 40% of the intentions, while the runner up candidate, Fernando Haddad, of a leftist coalition has about 25%; all the others have 35% in total.
A very intriguing debate put forward in the press last days was if the next president would be elected right way int he primary election round. So, will the Bandwagon, shy Tory, and something else effect help electing a far-right nationalist candidate by the absolute-majority criterion vote? Bayes says don't worry
about Bolsonaro’s victory by now.
Although polling houses are showing Bolsonaro’s support augmenting systematically over the last weeks, it’s fair to remember that pollsters did a very poor job in fielding the true vote share last elections. For instance, in 2014 the main polling firms mis-predicted both Dilma Rousseff’s and Aecio Neves’ true positions by saying Dilma was to win a majority with a margin, but the decision went to a instant runoff between these two candidates.
The following numbers represent the forecast with polling data made available over the last three days. Since there is a considerable number of swing voters in these polls, I did some math by distributing these undecideds before computing the final likely results. It’s a simplified simulation exercice as I do not account for time trends, house effects etc. I’m only accounting for the sample sizes.
The data
Bolsonaro Haddad Ciro Others Swing Wasting N
Datafolha 0.360 0.220 0.130 0.200 0.040 0.050 19552
Ibope 0.360 0.220 0.110 0.180 0.050 0.080 3010
MDA 0.367 0.240 0.099 0.155 0.060 0.078 2002
Ipespe 0.360 0.220 0.110 0.290 0.020 0.000 2000
VoxPopuli 0.340 0.270 0.110 0.130 0.070 0.080 2000
ParanaPesq 0.349 0.218 0.094 0.171 0.046 0.120 1080
Poll of polls
Here is where the magic begins. I weigh polls so to reflect their sample sizes. The new results are shown in last line (7) of the table.
options(digits=3)
wtd.polls <- rbind(data, c(apply(data[,1:6],2, weighted.mean, data$N), sum(data$N)))
print(wtd.polls)
Bolsonaro Haddad Ciro Others Swing Wasting N
Datafolha 0.360 0.220 0.130 0.200 0.0400 0.0500 19552
Ibope 0.360 0.220 0.110 0.180 0.0500 0.0800 3010
MDA 0.367 0.240 0.099 0.155 0.0600 0.0780 2002
Ipespe 0.360 0.220 0.110 0.290 0.0200 0.0000 2000
VoxPopuli 0.340 0.270 0.110 0.130 0.0700 0.0800 2000
ParanaPesq 0.349 0.218 0.094 0.171 0.0460 0.1200 1080
7 0.359 0.225 0.122 0.195 0.0433 0.0561 29644
Adjusting for the undecideds
Adjusting for swing voters, the new results are now the line (8) of the table.
options(digits=3)
wtd.polls[8,] <- data.frame(wtd.polls[7,1:4] +
wtd.polls[7,1:4] /
sum(wtd.polls[7,1:4]) *
wtd.polls[7,5],
Swing=0,
Wasting=wtd.polls[7,6],
N=wtd.polls[7,7])
print(wtd.polls)
Bolsonaro Haddad Ciro Others Swing Wasting N
Datafolha 0.360 0.220 0.130 0.200 0.0400 0.0500 19552
Ibope 0.360 0.220 0.110 0.180 0.0500 0.0800 3010
MDA 0.367 0.240 0.099 0.155 0.0600 0.0780 2002
Ipespe 0.360 0.220 0.110 0.290 0.0200 0.0000 2000
VoxPopuli 0.340 0.270 0.110 0.130 0.0700 0.0800 2000
ParanaPesq 0.349 0.218 0.094 0.171 0.0460 0.1200 1080
7 0.359 0.225 0.122 0.195 0.0433 0.0561 29644
8 0.376 0.235 0.128 0.205 0.0000 0.0561 29644
Adjusting for the wasting votes
Adjusting for wasting votes, follows the same principle. The last line of the following tbale (9) has the new adjusted preference distribution, with correct sample size.
options(digits=3)
wtd.polls[9,] <- data.frame(wtd.polls[8,1:4] +
wtd.polls[8,1:4] /
sum(wtd.polls[8,1:4]) *
wtd.polls[8,6],
Swing=0,
Wasting=0,
N=(wtd.polls[8,7] - (wtd.polls[8,6] * wtd.polls[8,7])))
print(wtd.polls)
Bolsonaro Haddad Ciro Others Swing Wasting N
Datafolha 0.360 0.220 0.130 0.200 0.0400 0.0500 19552
Ibope 0.360 0.220 0.110 0.180 0.0500 0.0800 3010
MDA 0.367 0.240 0.099 0.155 0.0600 0.0780 2002
Ipespe 0.360 0.220 0.110 0.290 0.0200 0.0000 2000
VoxPopuli 0.340 0.270 0.110 0.130 0.0700 0.0800 2000
ParanaPesq 0.349 0.218 0.094 0.171 0.0460 0.1200 1080
7 0.359 0.225 0.122 0.195 0.0433 0.0561 29644
8 0.376 0.235 0.128 0.205 0.0000 0.0561 29644
9 0.398 0.249 0.135 0.217 0.0000 0.0000 27980
wtd.polls$N[9] *
c(wtd.polls$Bolsonaro[9], (wtd.polls$Haddad[9] + wtd.polls$Ciro[9] + wtd.polls$Others[9]), 1 - wtd.polls$Bolsonaro[9] - (wtd.polls$Haddad[9] - wtd.polls$Ciro[9] - wtd.polls$Others[9]))+1
[1] 11146 16832 19708
Draw 1 million samples
Finally I draw a lot os samples from the posterior distribution using the weighted polls and uninformative priors to keep it simple.
poll <- c(4910, 3011, 1756, 2718)
library(SciencesPo)
library(MCMCpack)
### draw samples from the posterior
set.seed(1234)
MC <- 1000000
### Using uninformative prior (1,1,1,1)
#samples <- getDirichletSamples(MC, alpha = poll + rep(1,4))
row= 9
prob2win = function(row, export=1){
p=rdirichlet(100000,
wtd.polls$N[row] *
c(wtd.polls$Bolsonaro[row], wtd.polls$Haddad[row] + wtd.polls$Ciro[row] + wtd.polls$Others[row], 1 - wtd.polls$Bolsonaro[row] - wtd.polls$Haddad[row] - wtd.polls$Ciro[row] - wtd.polls$Others[row])+1)
if(export==1){
mean(p[,1]<p[,2]) ## No exceeds Yes?
} else {
return(p)
}
}
Here we want to look at the margins of Bolsonaro over the combined opposition candidates. The more candidates contesting for the seat, the greater the probability that the winning candidate will receive only a minority of the votes cast.
We can also use the middle 95% range to represent the uncertainty. The numbers say about 20% of times in 1 million elections Bolsonaro appears ahead the opposition formula. Therefore, it’s very unlikely he could win the election in the primary election round.
samples = prob2win(row= 9, export=0)
combinedOpposition <- (samples[,2])
frontRunner <- (samples[,1])
margin <- (combinedOpposition - frontRunner)
quantile(margin, probs = c(0.025, 0.5, 0.975))
2.5% 50% 97.5%
0.192 0.203 0.215
Finally, we can plot the posterior distribution of simulated elections where Bolsonaro is greater than the combined opposing votes. Based on the polling data at hands, and very little effort, we can believe the far-right nationalist candidate won’t make it this Sunday as press pundits are suggesting.
hist(margin,
col="gray",
prob = FALSE, # posterior distribution
breaks = "FD", xlab = expression(p[Bolsonaro] > p[Opposition]),
main = expression(paste(bold("Posterior Distribution of Elections With "), p[Bolsonaro] > p[Opposition])));
# Bayes estimate (middle 95%)
abline(v=mean(margin), col='red', lwd=3, lty=3);
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.