[This article was first published on Peter's stats stuff - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week I posted some analysis of individual voting behaviour in New Zealand’s 2014 general election. In that post, I used logistic regression in four different models to predict the probability of an individual giving party vote to each of the four largest parties – National, Labour, Green and New Zealand First. That let the user compare the people voting for each of those parties, one at a time, to the wider population.
A logical extension of this is to model party vote for those four categories, plus “other” and “did not vote”, simultaneously as a multinomial response. I tried this out with several different methods: a deep learning neural network (from Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
H2O
, random forest (trying out both the H2O
version and ranger
, a fast R/C++ implementation), and multinomial log-linear regression (from nnet
). The aim was to produce an interactive web tool that lets people see the impact of changing one variable at a time on predicted voting probabilities:
h2o.grid
function, the best performing model was the neural network with two hidden layers of 60 neurons each and a high dropout rate between each layer. However, this was a bit slow for the end user when implemented in Shiny for the web app, and I anticipated some further problems in deploying an H2O model to shinyapps.io
– problems I’ll address at some point, but not today. So in the end I used an average of the ranger
random forest and the nnet::multinom
multinomial regression models, which is nice and fast and gives very plausible results.
See:
- The web tool itself;
- Source code for the preparation for the Shiny app (I always separate out as much prep as possible from a Shiny app deployment, for ease of maintenance as well as faster user experience) and the various experiments in different models;
- Source code for the app itself.
To leave a comment for the author, please follow the link and comment on their blog: Peter's stats stuff - R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.