Site icon R-bloggers

Weighted Effect Coding: New publication in the R Journal

[This article was first published on Rense Nieuwenhuis » R-Project, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Weighted effect coding is a technique for dummy coding that can have attractive properties, particularly when analysing observational data. In a new publication in the R Journal we explain the rationale of weighted effect coding, introduce the ‘wec’ package, and provide examples that include interactions.

The attractive property of applying weighted effect coding to categorical (‘factor’) variables is that each category represents the deviation of that category from the sample mean. This is unlike the more commonly used treatment coding where each a specific category has to be selected as a reference. Weighted effect coding is a generalized form of effect coding that applies to both balanced and unbalanced data.

A form of weighted effect coding was already formulated in 1972 by Sweeney and Ulveling, but it seems to never have found its place in statistical repertoires. Weighted effect coding was not implemented in mainstream statistical software. In an ongoing project, we have now further developed weighted effect coding to also apply to interactions (with both categorical and continuous variables), and provide procedures for mainstream statistical software. For R, we developed the ‘wec’ package, and procedures for STATA and SPSS are available as well.

A key innovation in our article in the R Journal is the formulation of interactions between a categorical variable with a continuous variable. This is visualised in the Figure above. The benefit of estimating such an interaction with weighted effect coding is that upon entering the interaction terms the estimate for the continous variable (as well as the ‘main effects’ for the categorical variable) does not change. The ‘main’ continous term reflects the average effect in the sample, and the interaction terms represent the deviation of the effect size for each category.

References

Grotenhuis, Te, M, Pelzer, B., Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017b). A novel method for modelling interaction between categorical variables. International Journal of Public Health, 62(3), 427–431. (open access!)

Grotenhuis, Manfred, Ben Pelzer, Eisinga, R., Nieuwenhuis, R., Schmidt-Catran, A., & Konig, R. (2017a). When size matters: advantages of weighted effect coding in observational studies. International Journal of Public Health, (62), 163–167. (open access!)

Nieuwenhuis, R., Grotenhuis, Te, M., & Pelzer, B. (2017). Weighted Effect Coding for Observational Data with wec. R Journal, 9(1), 477–485. (open access!)

Sweeney, R. E., & Ulveling, E. F. (1972). A transformation for simplifying the interpretation of coefficients of binary variables in regression analysis. The American Statistician.

To leave a comment for the author, please follow the link and comment on their blog: Rense Nieuwenhuis » R-Project.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.