Using Survey Weight
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In R working with survey weight is made possible using survey
package. Let’s use below data frame as an exmaple here:
set.seed(1000) age <- c(18:100) age <- sample(age, 100, replace = TRUE) gender <- c("Male", "Female") gender <- sample(gender, 100, replace = TRUE) country <- c("A", "B") country <- sample(country, 100, replace = TRUE) df <- data.frame(age, gender, country) df$weight[df$gender=="Female"] <-50/sum(df$gender=="Female") df$weight[df$gender=="Male"] <-50/sum(df$gender=="Male") # summary of data summary(df) ## age gender country weight ## Min. :18.00 Length:100 Length:100 Min. :0.8929 ## 1st Qu.:38.75 Class :character Class :character 1st Qu.:0.8929 ## Median :54.50 Mode :character Mode :character Median :0.8929 ## Mean :55.79 Mean :1.0000 ## 3rd Qu.:73.25 3rd Qu.:1.1364 ## Max. :97.00 Max. :1.1364
The most important variable here is the weight variable which is constructed to balance the sex ratio.
Inside the survey
package, there is svydesign()
function that can be used to link a data frame with a weight.
# install.packages("survey") library(survey) df.w <- svydesign(ids = ~1, data = df, weights = ~weight)
The resulting object is not a data frame anymore, but is a list of different objects that can be seen using attributes()
function.
attributes(df.w) ## $names ## [1] "cluster" "strata" "has.strata" "prob" "allprob" ## [6] "call" "variables" "fpc" "pps" ## ## $class ## [1] "survey.design2" "survey.design"
Therefore, we need to use survey
’s own analytical functions. For example, here is a comparison of unweighted and weighted sex ratio.
# unweighted df %>% {table(.$gender)} %>% prop.table() ## ## Female Male ## 0.44 0.56 # weighted df.w %>% svytable(~gender, .) %>% prop.table() ## gender ## Female Male ## 0.5 0.5
svytable()
can be used to create more than one-way frequency/percentage tables as well. For example, let’s create contingency table of gender
and country
df.w %>% svytable(~gender+country, .) %>% prop.table(2) ## country ## gender A B ## Female 0.5600000 0.4329897 ## Male 0.4400000 0.5670103
Below are other useful functions of survey
package:
# to compute weighted mean svymean(~age, df.w) # to compute weighted quantiles svyquantile(~age, df.w, c(.25, .50, .75)) # to compute weigted variance svyvar(~age, df.w) # to perform t-test svyttest(age~gender, df.w)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.