Analyzing voter survey data with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I love polls. All kinds of polls, but especially political polls. I think I love them because I like politics and I also like to find out what’s going on in people’s heads, which is something that survey data allows one to do.
So I was thrilled to find Anthony Joseph Damico’s Analyze Survey Data for Free website. Specifically, I was interested in working with data from the American National Election Studies (ANES), which is a complex sample survey that collects responses on political belief and behavior from eligible voters in the U.S. Administered by Stanford University and the University of Michigan, and funded by the National Science Foundation, ANES is designed to generalize to all eligible voters in the U.S., so results give us a statistically sound view of what voters really think.
The data can be analyzed using either the survey
or srvyr
package, and can be downloaded via Damico’s lodown
package. The srvyr
package is nice, as it’s akin to dplyr
in syntax, but it’s limited in what it can do, so I will use both packages.
The first thing to do is get the data and construct the complex sample survey design.
library(lodown) library(survey) library(srvyr) # examine all available ANES microdata files anes_cat <- get_catalog( "anes" , output_dir = file.path( path.expand( "~" ) , "ANES" ) , your_email = "[email protected]" ) # 2016 only anes_cat <- subset( anes_cat , directory == "2016 Time Series Study" ) # download the microdata to your local computer anes_cat <- lodown( "anes" , anes_cat , your_email = "[email protected]" ) # Construct a complex sample survey design anes_df <- readRDS( file.path( path.expand( "~" ) , "ANES" , "2016 Time Series Study/anes_timeseries_2016_.rds" ) ) anes_design <- svydesign( ~v160202 , strata = ~v160201 , data = anes_df , weights = ~v160102 , nest = TRUE )
Now that I have the data, I’m going to recode some of the variables so that their titles are more descriptive. I was able to do this using the study’s codebook, which you can find here.
anes_design <- update( anes_design , one = 1 , supreme_court_score = ifelse( v162102 %in% 0:100 , v162102 , NA ) , muslims_score = ifelse( v162106 %in% 0:100 , v162106 , NA ) , police_score = ifelse( v162110 %in% 0:100 , v162110 , NA ) , blm_score = ifelse( v162113 %in% 0:100 , v162113 , NA ) , party_id = factor( v161158x , levels = 1:7 , labels = c( 'strong democrat' , 'not very strong democrat' , 'independent democrat' , 'independent', "independent republican", "not very strong republican", "strong republican") ) , rich_buy_elections = factor( v162220 , levels = 1:5 , labels = c( 'rich buy elections - all of the time' , 'rich buy elections - most of the time' , 'rich buy elections - about half the time' , 'rich buy elections - some of the time' , 'rich buy elections - never' ) ), bible_wordofgod = factor( v161243 , levels = 1:3 , labels = c( 'bible is word of god, to be taken literally' , 'bible is word of god but not everything to be taken literally' , 'bible is written by men and is not the word of god' ) ) , )
Now I want to plot a couple of the distributions, just to get some sense of the responses. The output of the survey
package is a survey design object, so ggplot2
, which is my preference most of the time, won’t work for this. But the survey
package includes some functions for basic charting.
svyhist(~v161243, design = anes_design, main = "Is the Bible the literal word of God", xlim=c(0, 3), ylim=c(0, 0.5), xlab = "", labels = c("bible to be taken literally", "bible not taken literally", "bible the work of men"))
svyhist(~v162220, design = anes_design, main = "Do the rich buy elections?", xlim=c(0, 5), ylim=c(0, 0.5), xlab = "", labels = c("all the time", "most of the time", "about half the time", "some of the time", "never"))
The score variables that I created above offer sentiment indicators based on a 0-100 thermometer that respondents use, where 100 is highly positive, 0 is highly negative, and 50 is neutral. I’m specifically interested in seeing how respondents from different categories assess, for example, the Supreme Court. I will use the srvyr
package, but in order to analyze the data with the srvyr
package, we need to get it into the proper format.
anes_srvyr_design <- as_survey( anes_design ) # Calculate the mean (average) of a linear variable, overall and by groups: anes_srvyr_design %>% summarize( mean = survey_mean( supreme_court_score , na.rm = TRUE ) )
# A tibble: 1 x 2 mean mean_se <dbl> <dbl> 1 58.3 0.389
anes_srvyr_design %>% group_by( party_id ) %>% summarize( mean = survey_mean( supreme_court_score , na.rm = TRUE ) )
# A tibble: 7 x 3 party_id mean mean_se <fct> <dbl> <dbl> 1 strong democrat 60.6 0.786 2 not very strong democrat 60.0 1.09 3 independent democrat 58.9 1.39 4 independent 53.7 1.24 5 independent republican 58.1 1.03 6 not very strong republican 57.9 1.07 7 strong republican 57.8 1.19
So with respect to the supreme court, people with differing ideologies generally view it the same. But what about more hot-button issues?
anes_srvyr_design %>% summarize( mean = survey_mean( muslims_score , na.rm = TRUE ) )
# A tibble: 1 x 2 mean mean_se <dbl> <dbl> 1 54.4 0.656
anes_srvyr_design %>% group_by( party_id ) %>% summarize( mean = survey_mean( muslims_score , na.rm = TRUE ) )
# A tibble: 7 x 3 party_id mean mean_se <fct> <dbl> <dbl> 1 strong democrat 67.1 1.04 2 not very strong democrat 58.8 1.52 3 independent democrat 63.5 1.44 4 independent 50.9 1.42 5 independent republican 49.4 1.51 6 not very strong republican 45.0 1.59 7 strong republican 40.9 1.34
So there is a significant difference in the survey data in how people with varying ideologies view Muslims, with people on the left viewing them much more favorably than those on the right. Now I want to visualize some of the scores, and since we’re using the srvyr
package now, we can use ggplot2
for these.
police <- anes_srvyr_design %>% group_by(party_id) %>% summarize(mean = survey_mean(police_score, na.rm = TRUE)) ggplot(police, aes(party_id, mean, fill=party_id)) + geom_bar(stat = "identity") + ylim(0, 100) + xlab("") + scale_fill_brewer(palette = "Set3") + theme(legend.position = "none", axis.text.x = element_text(angle = -30, hjust = 0, vjust = 1))
Given that 50 indicates a neutral sentiment, it looks like those on the left and right all have a generally favorable view of the police. Let’s see if that same phenomenon is true for Black Lives Matter.
blm <- anes_srvyr_design %>% group_by(party_id) %>% summarize(mean = survey_mean(blm_score, na.rm = TRUE)) ggplot(blm, aes(party_id, mean, fill=party_id)) + geom_bar(stat = "identity") + ylim(0, 100) + xlab("") + scale_fill_brewer(palette = "Set3") + theme(legend.position = "none", axis.text.x = element_text(angle = -30, hjust = 0, vjust = 1))
There are pretty substantial differences in this survey data in how people from varying ideologies view the Black Lives Matter movement. Those on the left have a fairly positive view of the movement, while those on the right have a decidedly negative view.
The srvyr
package actually does its work on the back of the survey
package. It doesn’t have all the functionality of the survey
package, but it it is preferable to it for me when I want to visualize basic descriptive statistics with either ggplot2
or another visualization package.
The post Analyzing voter survey data with R appeared first on my (mis)adventures in R programming.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.