Site icon R-bloggers

New version of WEC: focus on interactions

[This article was first published on Rense Nieuwenhuis » R-Project, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We have uploaded a new version of WEC, an R package to apply ‘weighted effect coding’ to your dummy variables. With weighted effect coding, your dummy variables represent the deviation of their respective category from the sample mean, rather than the deviation from a reference category. Particularly with observational data, which are often unbalanced, this can have attractive interpretations. We recently published two articles in which we discuss some of the advantages:

Grotenhuis, M., Ben Pelzer, Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2016b). When size matters: advantages of weighted effect coding in observational studies. International Journal of Public Health, 1–5. http://doi.org/10.1007/s00038-016-0901-1

Grotenhuis, M., Ben Pelzer, Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2016a). A novel method for modelling interaction between categorical variables. International Journal of Public Health, 1–5. http://doi.org/10.1007/s00038-016-0902-0

As some of the real advantages of weighted effect coding come into play when using interactions, that was what we focused in the current update to our ‘wec’ package (version 0.4). The package now supports interactions between a weighted effect coded factor variable and an interval variable, and the calculation of interactions between two weighted effect coded factor variables was much improved. An example is given below (with more to follow, hopefully soon).

library(wec) data(PUMS) PUMS$race.wec <- factor(PUMS$race) contrasts(PUMS$race.wec) <- contr.wec(PUMS$race.wec, "White") PUMS$race.educint <- wec.interact(PUMS$race.wec, PUMS$education.int) m.wec.educ <- lm(wage ~ race.wec + education.int + race.educint, data=PUMS) summary(m.wec.educ)$coefficients

The code above results in a regression model (shown below) in which the main effect for education (9048) remains the same, whether the interaction terms are included or not (you can try this yourself). Thus, the interaction terms represent how much the average education effect varies by race.

                            Estimate Std. Error t value Pr(>|t|)
(Intercept)                     52320        559    93.5  0.0e+00
race.wecHispanic                -4955       1736    -2.9  4.3e-03
race.wecBlack                  -11276       1817    -6.2  5.7e-10
race.wecAsian                    5151       2381     2.2  3.1e-02
education.int                    9048        287    31.6 2.3e-208
race.educintinteractHispanic    -3266        977    -3.3  8.3e-04
race.educintinteractBlack       -3293        990    -3.3  8.8e-04
race.educintinteractAsian        3575       1217     2.9  3.3e-03

To leave a comment for the author, please follow the link and comment on their blog: Rense Nieuwenhuis » R-Project.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.