[This article was first published on free range statistics - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The Australian Election Study is an impressive long term research project that has collected the attitudes and behaviours of a sample of individual voters after each Australian federal election since 1987. All the datasets and documentation are freely available. An individual survey of this sort is a vital complement to the necessarily aggregate results provided by the Australian Electoral Commission, and is essential for coming closer to understanding political and social change. Many countries have a comparable data source.
Many thanks to the original researchers:
McAllister, Ian; Makkai, Toni; Bean, Clive; Gibson, Rachel Kay, 2017, “Australian Election Study, 2016”, doi:10.4225/87/7OZCZA, ADA Dataverse, V2, UNF:6:TNnUHDn0ZNSlIM94TQphWw==; 1. Australian Election Study December 2016 Technical Report.pdf
In this blog post I just scratch the surface of the latest available data, from the time of the 2016 federal election. In later posts I’ll get into modelling it a bit deeper, and hopefully get a chance to look at some changes over time. Some of the code I use in this post for preparing the data might find its way into the ozfedelect R package, but not the data itself.
Getting the data
The data need to be downloaded by hand from the Australian Data Archive via the Dataverse (an international open source research data repository) in order to agree to the terms and conditions of use. The following code requires the SPSS version of the data and the Excel version of the data dictionary to have been saved in an ./aes/ subfolder of the working directory.
The SPSS data has column names truncated at eight characters long, which means that some of the variables in the data don’t match those in the data dictionary. The code below makes the necessary manual adjustments to variable names from the dictionary to deal with this, and re-classes the categorical responses as factors with the correct level labels in the correct order. I do this by importing and joining to the data dictionary rather than extracting the attributes information from the SPSS import, which might have also worked; I prefer my manual approach so I can deal with some anomalies explicitly (for example, many of the values listed as 999 “skipped” in the data dictionary are NA in the SPSS data).
Post continues after code extract
The last few lines of that code also defines some summarised variables that lump various parties or non-votes together and will be useful in later analysis.
Survey design and weights
The AES in 2016 used two separate sampling frames: one provided by the Australian Electoral Commission, and one from the Geo-coded National Address File or GNAF, a large, authoritative open data source of every address in Australia. For inference about Australian adults on the election roll (enrolment is required of all resident adult citizens by legislation), the AES team recommend using a combination of the two samples, and weights are provided in the wt_enrol column to do so.
The technical report describes data collection as mixed-mode (hard copy and online versions), with a solid system of managing contacts, reminders and non-returns.
The AEC sample is stratified by state, and the GNAF sample by Greater Capital City (GCC) statistical areas (which divide states into two eg “Greater Sydney” and “Rest of New South Wales”). As far as I can see no GCC variable is provided for end-use analysts such as myself. Nor are the various postcode variables populated, and I can’t see a variable for electorate/division (presumably for statistical disclosure control purposes) which is a shame as it would be an important variable to use for a variety of multi-level analyses. So we have to use “state” as our best regional variable.
Post-stratification weighting (raking) was used by the AES to match the population proportions by sex, age group, state, and first preference vote in the House of Representatives. This latter calibration is particularly important to ensure that analysis gives appropriate weight to the supporters of the various parties; it is highly likely that non-response is related to political views (or non-views).
The weights have been scaled to add up to the sample size, rather than to population numbers (as would be the approach of eg a national statistical office). This is presumably out of deference for SPSS’ behaviour of treating survey weights as frequencies, which leads to appalling results if they add up to anything other than the sample size (and still sub-optimal results even then).
Table 16 of the technical report is meant to provide the weighting benchmarks for the post-stratification weighting, but the state, age and sex totals given are only for the GNAF sample; and the benchmarks given for which party voted for appear to be simply incorrect in a way I cannot determine. They add up to much more than either the AEC or GNAF sample, but much less than their combined total; and I can’t match their results with any of the plausible combinations of filters I’ve tried. I’d be happy for anyone to point out if I’m doing something wrong, but my working hypothesis is that Table 16 simply contains some mistakes, whereas the weights in the actual data are correct.
The code below explores the three sets of weights provided, and defines a survey design object (using Thomas Lumley’s survey package) that treats state as the strata for sampling, and sex, state, age and House of Representatives vote as post-stratification variables. Although I won’t make much use of the survey package today, I’ll almost certainly want it for more sophisticated analysis in a later post.
Post continues after code extract
Analysis - vote and one question
Now we’re ready for some exploratory analysis. In Australia, the Senate is elected by proportional representation, a wider range of small parties stand candidates, and the first preference vote for that house is less subject to gaming by voters than is the single-transferrable-vote system in the House of Representatives. So I prefer to use Senate vote when examining the range of political opinion, particularly of supporters of the smaller parties. Here is one of the graphics that I love during exploration but which I despair of using for communication purposes - a mosaic plot, which beautifully visualises a contingency table of counts (or, in this case, weighted counts). This particular graphic shows the views of people who voted for different parties in the Senate on the trade off between reducing taxes and spening more on social services:
As any casual observer of Australian politics would have predicted, disproportionately large numbers (coloured blue to show the positive discrepency from the no-interaction null hypothesis) of Greens voters (and to a less extent Labor) are mildly or strongly in favour of spending more on social services. Correspondingly these parties’ supporters have relatively few people counted in the top left corner (coloured red to show a negative discrepancy from the null hypothesis) supporting reducing tax as the priority.
In contrast, there are relatively few Liberal voters in the social services camp, and disproportionately high numbers who favour reducing taxes.
The “other” voters - which includes those who did not vote, could not remember, or did not vote for one of the five most prominent parties - are disproportionately likely to sit on the fence or not have a view on the trade-off between social services and tax cuts, which again is consistent with expectations.
What about a different angle - what influences people when they decide their vote?
We see that Greens voters decided their vote disproportionately on the basis of “the policy issues”. Liberal voters have less people in the “policy” box than would be expected under the no-interaction null hypothesis and were more likely to decide based on either “the party leaders” or “the parties taken as a whole”. National voters stand out as the only party with a noticeable disproportionate count (high, in their case) in the “candidates in your electorate” box, emphasising the known spatially-specific nature of National’s support.
Here’s the code for drawing those mosaic plots, using the vcd R package that implements some useful ways of visualising categorical data.
Post continues after code extract
Analysis - vote and a battery of questions
Like many social sciences surveys, the AES contains a number of batteries of questions with similar response sets. An advantage of this, both for an analyst and the consumer of their results, is that we can set up a common approach to comparing the responses to those questions. Here is a selection of diverging likert plots depicting interesting results from these questions:
Above, we see that Greens voters disproportionately think that Australia hasn’t gone far enough in allowing more migrants into the country, and One Nation voters unsurprisingly tend to think the opposite. Views on change in Australia relating to Aboriginal assistance and land rights follow a similar pattern. Interestingly, there is less partisan difference on some other issues such as “equal opportunities for women”. Overall, the pattern is clear of increasing tendency to think that “things have gone to far” as we move across the spectrum (Greens-Labor-Liberal-National-One Nation).
When it comes to attitudes on the left-right economic issues, we see subtle but expected party differences (chart above). Liberal supporters are particularly inclined to think trade unions have too much power and should be regulated; but to disagree with the statement that income and wealth should be distributed towards ordinary working people.
In the social issues chart above, we see the Greens’ voters strongly disagreeing with turning back boats of asylum seekers, and reintroducing the death penalty - issues where One Nation voters tend to agree with the statement. Again, the clear sequencing of parties’ supporters (Greens-Labor-Liberal-National-One Nation) is most obvious on these social/cultural touchpoint issues, in contrast to the left-right economic debate.
In the above, we see high levels of confidence in the armed forces, police and universities across all parties’ supporters, and distrust in the press, television (what does it even mean to trust the television?) and political parties. Differences by party in trust in institutions varies along predictable party lines eg voters of the Liberal party tend to have less confidence in unions. One Nation voters have particularly low trust in the Australian political system, consistent with the mainstream interpretation in political science of the rise of socially conservative populist parties of this sort.
Attitudes to public spend are fairly consistent across parties, with subtle but unsurprising differences in a few value-based touch points such as defence (Greens favour spending less) and unemployment benefits (Greens favour spending more but the three conservative parties favour spending less; Labor voters are in the middle).
Finally, when it comes to factual knowledge of election law and constitutional issues, there is little to say from a party perspective. Many people don’t know the maximum period between federal elections for the lower house (the correct answer is three); and doubtless niether know nor care about the need to pay a deposit to stand for election. But there is no obvious difference by party voted for.
In summary, on many of the key social variables of the day, we see the expected greatest contrast between supporters of the Greens on one hand and One Nation on the other, with the ALP, Liberal and National voters strung out between. Economic issues indicate another but closely related dimension, with One Nation closer to the Greens than they are to the other socially conservative parties on statements such as “big business in this country has too much power”, “the government should take measures to reduce differences in income levels” and “trade unions in this country have too much power”.
Here’s the code for drawing those diverging-Likert plots. I’m not particularly proud of it, it’s a bit repetitive (I hadn’t realised I’d be doing six of these), but it get’s the job done. There are some specialist R packages for drawing these charts, but I prefer (at this point) build them with the standard grammar of graphics tools in ggplot2 rather learn a new specific API.
Post continues after code extract*
Final word
OK, watch this space for more analysis using the AES (2016 and other years), if and when I get time.
Note - I have no affiliation or contact with the Australian Election Study, and the AES researchers bear no responsibility for my analysis or interpretation of their data. Use of the AES data by me is solely at my risk and I indemnify the Australian Data Archive and its host institution, The Australian National University. The Australian Data Archive and its host institution, The Australian National University, are not responsible for the accuracy and completeness of the material supplied. Similarly, if you use my analysis, it is at your risk. Don’t blame me for anything that goes wrong, even if I made a mistake. But it would be nice if you let me know.