Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
President=read.table("http://freakonometrics.blog.free.fr/public/data/us-president-height.csv",skip=3) Y=as.numeric(substr(as.character(President$V4),1,3)) X=as.numeric(substr(as.character(President$V8),1,3)) plot(X,Y,xlab="loser",ylab="winner") polygon(c(0,250,0),c(0,250,250),col="light green") polygon(c(0,250,250),c(0,250,0),col="light blue") points(X,Y,pch=19,col="red")
First, we plot the height of the winner versus the height of the loser,
where in the green area, the taller wins, and in the blue are, the shorter wins.
So, obviously, it is not that simple….
But we can go one step further: the size of the candidate might have an
influence if electors actually the candidates, so perhaps the height
has only a recent influence.
Here is the graph of the evolution of the height of the candidates, with a linear trend, a green one for the winner, and a blue one for the loser.
Z=as.numeric(as.character(President$V1)) plot(c(Z,Z),c(X,Y)) abline(lm(Y~Z),col="light green",lwd=2) abline(lm(X~Z),col="light blue",lwd=2)
Somehow, the winner is getting taller much faster than the loser (there is an overall increase of the population height over two centuries). Maybe it is time to run some tests, to see if the height can truly be used to predict the winner of US elections,
> Z1=(Y>=X) > Z2=(Y>X) > prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data: sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2 X-squared = 2.2222, df = 1, p-value = 0.932 alternative hypothesis: true p is less than 0.5 95 percent confidence interval: 0.0000000 0.7407815 sample estimates: p 0.6222222 > prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data: sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2 X-squared = 0.0889, df = 1, p-value = 0.6172 alternative hypothesis: true p is less than 0.5 95 percent confidence interval: 0.0000000 0.6605522 sample estimates: p 0.5333333
In 53% of the elections (only), the winner is strictly taller (and in 62% of the elections, he is taller). Here, we (statistically) accept the assumption that the taller wins. But it is even stronger if we focus only on the past 110 years (following World War I),
> I=Z>1918 > Z1=(Y>=X)[I] > Z2=(Y>X)[I] > prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data: sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2 X-squared = 6.2609, df = 1, p-value = 0.9938 alternative hypothesis: true p is less than 0.5 95 percent confidence interval: 0.0000000 0.9049412 sample estimates: p 0.7826087 > prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data: sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2 X-squared = 2.7826, df = 1, p-value = 0.9524 alternative hypothesis: true p is less than 0.5 95 percent confidence interval: 0.0000000 0.8423696 sample estimates: p 0.6956522
In almost 80% of the elections following WWI, the taller candidate won the election. I guess I have here a nice and simple model to predict who will win the elections next year…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.