How to create a loop to run multiple regression models
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A friend asked me whether I can create a loop which will run multiple regression models. She wanted to evaluate the association between 100 dependent variables (outcome) and 100 independent variable (exposure), which means 10,000 regression models. Regression models with multiple dependent (outcome) and independent (exposure) variables are common in genetics.
So models will be something like this: (dx is dependent and ix is independent variable, v are other variables)
dx1 = ix1 + v1 + v2 + v3 dx1 = ix2 + v1 + v2 + v3 dx1 = ix3 + v1 + v2 + v3 ... dx1 = ix100 + v1 + v2 + v3 dx2 = ix1 + v1 + v2 + v3 ... dx100 = ix100 + v1 + v2 + v3
The output should be a data frame with 5 columns, including dependent variable, independent variable, beta estimate, standard error and the p-value.
Something like this (those numbers are just for illustration purposes):
d i beta se pvalue 1 dx1 ix1 0.1 0.002 0.950 2 dx2 ix2 0.2 0.002 0.826 3 dx3 ix3 0.3 0.005 0.123
OK, now lets begin: the dataset that I received had all the variables in columns and observations in rows (the data is not real, just random numbers for illustration purposes):
id dx1 dx2 ... dx100 ix1 ... 1x100 v1 v2 v3 10 324 124 ... 214 32 ... 32 ax b4 c3 11 431 982 ... 114 12 ... 77 ce b2 c5 12 545 123 ... 104 34 ... 11 ar c2 a5 ....
Position of variables
Create vectors for the position of the dependent and independent variables in your dataset.
# outcome out_start=2 out_end= 101 out_nvar=out_end-out_start+1 out_variable=rep(NA, out_nvar) out_beta=rep(NA, out_nvar) out_se = rep(NA, out_nvar) out_pvalue=rep(NA, out_nvar) # exposure exp_start=102 exp_end=203 exp_nvar=exp_end-exp_start+1 exp_variable=rep(NA, exp_nvar) exp_beta=rep(NA, exp_nvar) exp_se = rep(NA, out_nvar) exp_pvalue=rep(NA, exp_nvar) number=1
For Loop
I used linear mixed effect model and therefore I loaded the lme4
library. The loop should work with other regression analysis (i.e. linear regression), if you modify it according to your regression model. If you don’t know which part to modify, leave a comment below and I will try to help.
As other loops, this call variables of interest one by one and for each of them extract and store the betas, standard error and p value. Remember, this code is specific for linear mixed effect models.
library(lme4) for (i in out_start:out_end){ outcome = colnames(dat)[i] for (j in exp_start:exp_end){ exposure = colnames(dat)[j] model <- lmer(get(outcome) ~ get(exposure) + v1 + (1|v2) + (1|v3), na.action = na.exclude, data=dat) Vcov <- vcov(model, useScale = FALSE) beta <- fixef(model) se <- sqrt(diag(Vcov)) zval <- beta / se pval <- 2 * pnorm(abs(zval), lower.tail = FALSE) out_beta[number] = as.numeric(beta[2]) out_se[number] = as.numeric(se[2]) out_pvalue[number] = as.numeric(pval[2]) out_variable[number] = outcome number = number + 1 exp_beta[number] = as.numeric(beta[2]) exp_se[number] = as.numeric(se[2]) exp_pvalue[number] = as.numeric(pval[2]) exp_variable[number] = exposure number = number + 1 } }
Create a dataframe with results:
outcome = data.frame(out_variable, out_beta, out_se, out_pvalue) exposure = data.frame(exp_variable, exp_beta, exp_se, exp_pvalue)
Management of the dataframe
We have 2 different dataframes with our results and we need to combine in one. With the help of tidyverse
package this is a simple task. Basically we rename variables by giving the same name and after we merge both dataframes together.
library(tidyverse) outcome = outcome %>% rename( variable = out_variable, beta = out_beta, se = out_se, pvalue = out_pvalue, obs = out_nobs ) exposure = exposure %>% rename( variable = exp_variable, beta = exp_beta, se = exp_se, pvalue = exp_pvalue, obs = exp_nobs ) all = rbind(outcome, exposure) all = na.omit(all) head(all) variable beta se pvalue 1 dx1 0.1 0.002 0.950 3 dx2 0.2 0.002 0.826 ........ 2 ix1 0.1 0.002 0.950 4 ix2 0.2 0.002 0.826 ........
Yet, this is not a dataframe that we are looking for. We need a dataframe to have both dependent and independent variables in one row. Therefore, we do the final transformation as follows:
data = all %>% mutate( type = substr(variable, 1, 2) ) %>% spread(type, variable) %>% rename( d = dx, i = ix ) %>% mutate ( beta = round(beta, 5), se = round(se, 5), pvalue = round(pvalue, 5) ) %>% select(d, i, beta, se, pvalue) head(data) d i beta se pvalue 1 dx1 ix1 0.1 0.002 0.950 2 dx2 ix2 0.2 0.002 0.826 3 dx3 ix3 0.3 0.005 0.123
I hope you find this post useful for your research and data analysis!
Related Post
- Regression model with auto correlated errors – Part 3, some astrology
- Regression model with auto correlated errors – Part 2, the models
- Regression model with auto correlated errors – Part 1, the data
- Linear Regression from Scratch in R
- R for Publication by Page Piccinini: Lesson 5 – Analysis of Variance (ANOVA)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.