Using Survey Weight

[This article was first published on R | Fahim Ahmad, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In R working with survey weight is made possible using survey package. Let’s use below data frame as an exmaple here:

set.seed(1000)
age <- c(18:100)
age <- sample(age, 100, replace = TRUE)
gender <- c("Male", "Female")
gender <- sample(gender, 100, replace = TRUE)
country <- c("A", "B")
country <- sample(country, 100, replace = TRUE)
df <- data.frame(age, gender, country)
df$weight[df$gender=="Female"] <-50/sum(df$gender=="Female")
df$weight[df$gender=="Male"] <-50/sum(df$gender=="Male")
# summary of data
summary(df)
## age gender country weight
## Min. :18.00 Length:100 Length:100 Min. :0.8929
## 1st Qu.:38.75 Class :character Class :character 1st Qu.:0.8929
## Median :54.50 Mode :character Mode :character Median :0.8929
## Mean :55.79 Mean :1.0000
## 3rd Qu.:73.25 3rd Qu.:1.1364
## Max. :97.00 Max. :1.1364

The most important variable here is the weight variable which is constructed to balance the sex ratio.

Inside the survey package, there is svydesign() function that can be used to link a data frame with a weight.

# install.packages("survey")
library(survey)
df.w <- svydesign(ids = ~1, data = df, weights = ~weight)

The resulting object is not a data frame anymore, but is a list of different objects that can be seen using attributes() function.

attributes(df.w)
## $names
## [1] "cluster" "strata" "has.strata" "prob" "allprob"
## [6] "call" "variables" "fpc" "pps"
##
## $class
## [1] "survey.design2" "survey.design"

Therefore, we need to use survey’s own analytical functions. For example, here is a comparison of unweighted and weighted sex ratio.

# unweighted
df %>%
{table(.$gender)} %>%
prop.table()
##
## Female Male
## 0.44 0.56
# weighted
df.w %>%
svytable(~gender, .) %>%
prop.table()
## gender
## Female Male
## 0.5 0.5

svytable() can be used to create more than one-way frequency/percentage tables as well. For example, let’s create contingency table of gender and country

df.w %>%
svytable(~gender+country, .) %>%
prop.table(2)
## country
## gender A B
## Female 0.5600000 0.4329897
## Male 0.4400000 0.5670103

Below are other useful functions of survey package:

# to compute weighted mean
svymean(~age, df.w)
# to compute weighted quantiles
svyquantile(~age, df.w, c(.25, .50, .75))
# to compute weigted variance
svyvar(~age, df.w)
# to perform t-test
svyttest(age~gender, df.w)
To leave a comment for the author, please follow the link and comment on their blog: R | Fahim Ahmad.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)