New data and functions in nzelect 0.3.0 R package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Polling data and other goodies ready for download
A new version, 0.3.0, of the nzelect
R package is now available on CRAN.
- historical polling data from 2002 to February 2017, sourced from Wikipedia
- some small functions to help convert voting numbers into seats in a New Zealand or similar proportional representation system; and to weight polling numbers in the way done by two of New Zealand’s polling aggregator websites.
- some small bits of metadata on political parties, most notably named vectors of their colours for use in graphics.
nzelect
was originally developed in response to Ari Lamstein’s R election analysis contest, and my series of blog posts drawing on its data won me that competition. The first major version included polling-place election results from the 2014 election. The main purpose of the package remains to make these data available in tidier, more analysis-ready format than the Electoral Commission’s official election results site.
I still have plans to add the results from earlier elections, but with a New Zealand general election now scheduled for 23 September 2017 I thought I’d prioritise some polling data. I’m hoping to up the level of sophistication of at least a corner of the debate in the leadup to the election. I’ve got some blog posts on things like house effects (eg historical biases of different polling firms when confronted with actual election results) and probabilistic prediction coming up, and needed a clean and tidy set of historical data to do this.
Historical polling data
The polling data was the main addition to this version of nzelect
. The data have been scraped from a range of Wikipedia pages and subjected to some cleaning. With this somewhat sketchy provenance, they can’t be guaranteed but they look very plausible. All the data have been combined in a single data frame polls
which looks like this:
Pollster WikipediaDates StartDate EndDate MidDate Party VotingIntention Client ElectionYear
5965 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13 United Future 0.00 One News 2017
5966 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13 Maori 0.01 One News 2017
5967 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13 Destiny NA One News 2017
5968 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13 Progressive NA One News 2017
5969 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13 Mana 0.01 One News 2017
5970 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13 Conservative 0.00 One News 2017
Election results (actual total party vote by party) are also included for convenience. This doesn’t remove the need to bring in the detailed historical election results by polling place in future versions of nzelect
, but gives a start on the historical perspective.
Combining multiple election cycles of polling data together makes it possible to see the longer game in party political change in New Zealand:
Here’s the code behind that graphic.
The CRAN version of an R package isn’t the appropriate way to make day to day updates available – upgrades on CRAN should only be every three months or so at the most. I will probably keep the GitHub version up to date as more polling data comes in, but I’m not in the position to give a service level commitment on timeliness.
Converting voting results to seats
One thing we need to be able to do efficiently if we’re going to facilitate polling punditry is convert election results – real or hypothetical – into actual seats in Parliament. Since 1996, New Zealand has 120 or more seats in its single house of parliament to be allocated by a system known as “mixed-member proportional”. Each elector has two votes – an electorate vote and a party vote. 71 of the seats (at the time of writing) are “first past the post” electorates. However, these electorate votes have very little (not quite none) impact on the total make-up of Parliament, because the remaining seats are allocated to “lists” provided by the parties in such a way as to be in proportion to the electors’ party votes.
Only parties that received five percent of the party vote, or that have won at least one electorate, are counted in the proportional representation part.
The reason why there are 120 “or more” seats is the same as the reason why electorate votes have not quite exactly zero impact on the overall makeup. If a party wins more electorates than they would be entitled to from their party vote, the difference is translated into “overhang seats”. After the 2014 election there was one such seat; the leader of the United Future party won the Ōhariu electorate, but the party received only 0.22% of the party vote (much less than 1/120th), so he holds the 121st seat in Parliament.
The method of allocating the list seats is known as the Sainte-Laguë or Webster/Sainte-Laguë method. It’s well explained on Wikipedia. It’s now available in nzelect
via the allocate_seats()
function.
allocate_seats
defaults to New Zealand settings (5% threshold, 120 seats if no overhang seats) but these can be set to other values for use in other electoral systems or conducting thought experiments about New Zealand. For example, the 5% threshhold acts as a barrier to small parties getting representation in Parliament proportionate to their support. A 2012 review recommended amongst other things reducing this to 4%, although this wasn’t adopted by Parliament. Other countries with similar systems have lower thresholds; for example, Israel has had no less than four different threshold figures in the past thirty years (1%, 1.5%, 2%, and the current value of 3.25%).
Here’s how the New Zealand Parliament would have looked with different values of the threshold for getting access to the proportional representation part of the system:
We can see that the Conservative party were the big losers from the 5% threshold rule; with 3.97% of the party vote and no seats in Parliament under current rules (or indeed the 4% threshold proposed and rejected in 2012), but five seats if using the Israeli threshold.
Using electoral results data and functions from the nzelect
and other packages, here’s how that analysis was done:
For comparison, here are the same scenarios with the 2011 election results:
Because the full 2011 election results aren’t yet available in nzelect
, the code below needs to scrape them from a HTML table on the Electoral Commission’s site, using the very user-friendly rvest
R package:
New Zealand has a Westminster-like rather than USA-like relation of parliament to the executive, in that the government needs to command a majority in Parliament (indicated in budget or confidence votes) or resign. In practice, this generally leads to government by a coalition of parties, although this is not inevitable; in the 2014 election the National Party came within a whisker of being able to govern by itself (although political logic suggests they would have kept at least some junior coalition parties anyway).
nzelect
is available for installation from CRAN the usual way (install.packages("nzelect")
). Report any bugs, enhancement requests or issues on GitHub please. I already know about the wrong vignette title on CRAN :(… will fix it next release.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.