analyze the european social survey (ess) with r

Anthony Damico

8 years ago

[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

with more than a decade of microdata aimed at gauging the political mood across european nations, the european social survey (ess) allows scientists like you to examine socio-demographic shifts among broad groups all the way down to pirate party (piratpartiet) voters in sweden. with much of the same scope as the united states’ general social survey (gss), this biennial survey gives demographers the clearest window into political opinion and behavior across the continent.

run out of the city university london and six other centres, this survey sets its sample universe at all persons aged 15 and over resident within private households, regardless of nationality, citizenship, language or legal status in the participating countries. however, it’s smart – dare i say very smart – to check the documentation report (here’s round five) and confirm that the statistics you’re coming up with actually generalize to the resident populations that you think that they do.

after enduring a few spammy e-mails from me, daniel oberski agreed to co-author this post and all of the code. dan spent a handful of years in catalonia at upf‘s ess competence centre, so in addition to being able to disentangle and simplify this survey’s tricky methodology for us, he’s also provided a wicked starter script on structural equation modeling (sem) with complex sample survey data, using his very own lavaan.survey package. so tell him thanks. this new github repository contains four scripts:

download all microdata.R

after you register for an account, plop `your.email` at the top of this script and let ‘er rip
automatically log in and determine which countries and rounds are currently available
for each round available, cycle through each file available, download, unzip, and import it.
save everything on the local disk as a convenient data.frame object

analysis examples.R

load a country-specific data set, merge on the survey design data file, remove unnecessary columns
construct a survey design object producing taylor series linearized standard errors
use that survey design object to run examples of any, every summary statistical analysis you’ll need

structural equation modeling examples.R

load, merge, construct a german survey design object producing taylor series linearized standard errors
and also a latent variable model (without and then with survey-adjustment) in order to imitate the confirmatory factor analysis model in this multidimensionality of welfare attitudes paper
load, merge, stack, and then construct a german plus spanish tsl design and also a latent variable model (without and then with survey-adjustment) in order to imitate the metric cross-country invariance test in this schwarz human values measurement paper
use the same german plus spanish stacked design to construct an unadjusted and then a survey-adjusted test for the cross-country equality of a relationship between two latent variables in a structural equation model from this support for immigration paper

replication.R

load the all-country round five set to match some rudimentary nesstar output
load a country-specific data set, merge on the survey design data file, construct a tsl design in secret
by hand, start re-constructing some country-specific statistics in the official ess survey design analysis document
in one fell swoop, create the design effect once again, but this time using the survey package

click here to view these four scripts

for more detail about the european social survey (ess), visit:

notes:

some analysts blindly start with the integrated, multi-country data set for each round. that file contains all countries stacked into a single data table and the appropriate within-country weights, so you’ll get the correct point estimates (means, medians, percents). unfortunately, the integrated file does not contain other sample design information such as clusters and strata, which influence standard errors and statistical tests. so it’s generally necessary to use the country-specific files and associated sample design data file (sddf) if you’re itching to calculate a confidence interval, standard error, or any kind of honest statistical test. a classical approximation to correct standard errors is to multiply the standard error you get without accounting for the survey design by the square root of the “design effect”; the norwegian social science data services have created this tutorial on how to calculate design effects for linear functions of the data such as means and totals, but if that’s over your head or you want to estimate something other than means or totals, just use our scripts instead.

confidential to sas, spss, stata, and sudaan users: unless you’re a paleontologist, forget those fossils and transition to r. 😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.