R Tutorial Series: Regression With Interaction Variables
[This article was first published on R Tutorial Series, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Interaction variables introduce an additional level of regression analysis by allowing researchers to explore the synergistic effects of combined predictors. This tutorial will explore how interaction models can be created in R.
Tutorial Files
Before we begin, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains variables for the following information related to ice cream consumption.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
- DATE: Time period (1-30)
- CONSUME: Ice cream consumption in pints per capita
- PRICE: Per pint price of ice cream in dollars
- INC: Weekly family income in dollars
- TEMP: Mean temperature in degrees F
Planning The Model
Suppose that our research question is “how much of the variance in ice cream consumption can be predicted by per pint price, weekly family income, mean temperature, and the interaction between per pint price and weekly family income?” The italicized interaction term is the new addition to our typical multiple regression modeling procedure. This variable is relatively simple to incorporate, but it does require a few preparations.Creating The Interaction Variable
A two step process can be followed to create an interaction variable in R. First, the input variables must be centered to mitigate multicollinearity. Second, these variables must be multiplied to create the interaction variable.Step 1: Centering
To center a variable, simply subtract its mean from each data point and save the result into a new R variable, as demonstrated below.
- > #center the input variables
- > PRICEc <- PRICE - mean(PRICE)
- > INCc <- INC - mean(INC)
Step 2: Multiplication
Once the input variables have been centered, the interaction term can be created. Since an interaction is formed by the product of two or more predictors, we can simply multiply our centered terms from step one and save the result into a new R variable, as demonstrated below.
- > #create the interaction variable
- > PRICEINCi <- PRICEc * INCc
Creating The Model
Now we have all of the pieces necessary to assemble our complete interaction model.A summary of our interaction model is displayed below. At this point we have a complete interaction model. Naturally, if this were a full research analysis, we would likely compare this model to others and assess the value of each predictor. For information on comparing models, see the tutorial on hierarchical linear modeling.
- > #create the interaction model using lm(FORMULA, DATAVAR)
- > #predict ice cream consumption by its per pint price, weekly family income, mean temperature, and the interaction between per pint price and weekly family income
- > interactionModel <- lm(CONSUME ~ PRICE + INC + TEMP + PRICEINCi, datavar)
- > #display summary information about the model
- > summary(interactionModel)
Complete Interaction Model Example
To see a complete example of how an interaction model can be created in R, please download the interaction model example (.txt) file.References
Kadiyala, K. (1970). Ice Cream [Data File]. Retrieved December 14, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/IceCream.htmlTo leave a comment for the author, please follow the link and comment on their blog: R Tutorial Series.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.