Structural Equation Modelling in R (Part 2)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Brief explanation
This is the second part in a series on three articles about Structural Equation Modelling (SEM). This time I am glad to announce Jodie Burchell as a co-writer!
In Structural Equation Modelling in R (Part 1) I explained the basics of CFA. SEM was explained as a general case of CFA that was going be explained later, so here we go.
Structural Equation Modeling
SEM models or causal models arise from confirmatory models. A confimatory model becomes a causal model if all latent variables are defined by observed variables.
Bollen (1989) proposed a SEM model to study an aspect of democracy. In his work he analyzes the relationship between freedom of the press, freedom of political opposition and fairness of elections among other variables such as industrialization (details here).
We can represent Bollen’s model in a diagram:
help("PoliticalDemocracy")
provides a description of the variables within the dataset but it is recommended to read Political Democracy and the Timing of Development by K.A. Bollen.
In this model, \(\newcommand{\R}{\mathbb{R}} x_2\), \(x_2\) and \(x_3\) are the gross national product (GNP) per capita in 1960, the inanimate energy consumption per capita in 1960, and the percentage of the labor force in industry in 1960 respectively. Variables \(y_1\) to \(y_8\) represent:
- Expert ratings of the freedom of the press in 1960
- The freedom of political opposition in 1960
- The fairness of elections in 1960
- The effectiveness of the elected legislature in 1960
- Expert ratings of the freedom of the press in 1965
- The freedom of political opposition in 1965
- The fairness of elections in 1965
- The effectiveness of the elected legislature in 1965
Finally, \(z_1\), \(z_2\) and \(z_3\) represent the degree of industrialization in 1960, political democracy in 1960, and political democracy in 1965 respectively.
As in the case of CFA double-headed arrows represent an association, but in this case between the \(y_i\)‘s single-headed arrows represent direct effects, and \(\varepsilon_i\), \(\delta_i\), \(\zeta_i\) represent error terms.
Here the error terms help us to understand an SEM model as a net of regressions. In R notation \(ind60 \sim x_1 + x_2 + x_3\) would be a short form of writing \(ind60 = \lambda_1 x_1 + \lambda_2 x_2 + \lambda_3 x_3 + \zeta_1\) where each \(\lambda_i\) is a regression coefficient that can be estimated using ML, OLS or a variety of methods.
Using Lavaan to obtain parameters estimates
First we load the needed packages to perform the analysis:
# Needed packages to perform the analysis load <- function(pkg){ new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])] if (length(new.pkg)) install.packages(new.pkg, dependencies = TRUE) sapply(pkg, require, character.only = TRUE) } packages <- c("lavaan","semPlot","semTools","nonnest2","htmlTable") load(packages)
Then we fit the model using the sem()
function:
# 1.2. Specify the model Bollen.model <- '# measurement model ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8' # 1.3. Fit the model fit <- sem(Bollen.model, data = PoliticalDemocracy)
Then we obtain a model summary, confidence intervals and goodness of fit indicators:
# 2.1. Display summary output summary(fit, fit.measures=FALSE) lavaan (0.5-22) converged normally after 68 iterations Number of observations 75 Estimator ML Minimum Function Test Statistic 38.125 Degrees of freedom 35 P-value (Chi-square) 0.329 Parameter Estimates: Information Expected Standard Errors Standard Latent Variables: Estimate Std.Err z-value P(>|z|) ind60 =~ x1 1.000 x2 2.180 0.139 15.742 0.000 x3 1.819 0.152 11.967 0.000 dem60 =~ y1 1.000 y2 1.257 0.182 6.889 0.000 y3 1.058 0.151 6.987 0.000 y4 1.265 0.145 8.722 0.000 dem65 =~ y5 1.000 y6 1.186 0.169 7.024 0.000 y7 1.280 0.160 8.002 0.000 y8 1.266 0.158 8.007 0.000 Regressions: Estimate Std.Err z-value P(>|z|) dem60 ~ ind60 1.483 0.399 3.715 0.000 dem65 ~ ind60 0.572 0.221 2.586 0.010 dem60 0.837 0.098 8.514 0.000 Covariances: Estimate Std.Err z-value P(>|z|) .y1 ~~ .y5 0.624 0.358 1.741 0.082 .y2 ~~ .y4 1.313 0.702 1.871 0.061 .y6 2.153 0.734 2.934 0.003 .y3 ~~ .y7 0.795 0.608 1.308 0.191 .y4 ~~ .y8 0.348 0.442 0.787 0.431 .y6 ~~ .y8 1.356 0.568 2.386 0.017 Variances: Estimate Std.Err z-value P(>|z|) .x1 0.082 0.019 4.184 0.000 .x2 0.120 0.070 1.718 0.086 .x3 0.467 0.090 5.177 0.000 .y1 1.891 0.444 4.256 0.000 .y2 7.373 1.374 5.366 0.000 .y3 5.067 0.952 5.324 0.000 .y4 3.148 0.739 4.261 0.000 .y5 2.351 0.480 4.895 0.000 .y6 4.954 0.914 5.419 0.000 .y7 3.431 0.713 4.814 0.000 .y8 3.254 0.695 4.685 0.000 ind60 0.448 0.087 5.173 0.000 .dem60 3.956 0.921 4.295 0.000 .dem65 0.172 0.215 0.803 0.422 # 2.2. Obtain confidence intervals for the estimated coefficients parameterEstimates(fit, ci = TRUE, level = 0.95) lhs op rhs est se z pvalue ci.lower ci.upper 1 ind60 =~ x1 1.000 0.000 NA NA 1.000 1.000 2 ind60 =~ x2 2.180 0.139 15.742 0.000 1.909 2.452 3 ind60 =~ x3 1.819 0.152 11.967 0.000 1.521 2.116 4 dem60 =~ y1 1.000 0.000 NA NA 1.000 1.000 5 dem60 =~ y2 1.257 0.182 6.889 0.000 0.899 1.614 6 dem60 =~ y3 1.058 0.151 6.987 0.000 0.761 1.354 7 dem60 =~ y4 1.265 0.145 8.722 0.000 0.981 1.549 8 dem65 =~ y5 1.000 0.000 NA NA 1.000 1.000 9 dem65 =~ y6 1.186 0.169 7.024 0.000 0.855 1.517 10 dem65 =~ y7 1.280 0.160 8.002 0.000 0.966 1.593 11 dem65 =~ y8 1.266 0.158 8.007 0.000 0.956 1.576 12 dem60 ~ ind60 1.483 0.399 3.715 0.000 0.701 2.265 13 dem65 ~ ind60 0.572 0.221 2.586 0.010 0.139 1.006 14 dem65 ~ dem60 0.837 0.098 8.514 0.000 0.645 1.030 15 y1 ~~ y5 0.624 0.358 1.741 0.082 -0.079 1.326 16 y2 ~~ y4 1.313 0.702 1.871 0.061 -0.063 2.689 17 y2 ~~ y6 2.153 0.734 2.934 0.003 0.715 3.591 18 y3 ~~ y7 0.795 0.608 1.308 0.191 -0.396 1.986 19 y4 ~~ y8 0.348 0.442 0.787 0.431 -0.519 1.215 20 y6 ~~ y8 1.356 0.568 2.386 0.017 0.242 2.470 21 x1 ~~ x1 0.082 0.019 4.184 0.000 0.043 0.120 22 x2 ~~ x2 0.120 0.070 1.718 0.086 -0.017 0.256 23 x3 ~~ x3 0.467 0.090 5.177 0.000 0.290 0.643 24 y1 ~~ y1 1.891 0.444 4.256 0.000 1.020 2.762 25 y2 ~~ y2 7.373 1.374 5.366 0.000 4.680 10.066 26 y3 ~~ y3 5.067 0.952 5.324 0.000 3.202 6.933 27 y4 ~~ y4 3.148 0.739 4.261 0.000 1.700 4.596 28 y5 ~~ y5 2.351 0.480 4.895 0.000 1.410 3.292 29 y6 ~~ y6 4.954 0.914 5.419 0.000 3.162 6.746 30 y7 ~~ y7 3.431 0.713 4.814 0.000 2.034 4.829 31 y8 ~~ y8 3.254 0.695 4.685 0.000 1.893 4.615 32 ind60 ~~ ind60 0.448 0.087 5.173 0.000 0.279 0.618 33 dem60 ~~ dem60 3.956 0.921 4.295 0.000 2.151 5.762 34 dem65 ~~ dem65 0.172 0.215 0.803 0.422 -0.249 0.593 # 2.3 Obtain goodness of fit indicators of the model fitMeasures(fit, c("chisq", "rmsea", "srmr", "gfi", "ecvi")) chisq rmsea srmr gfi ecvi 38.125 0.035 0.044 0.923 1.335 reliability(fit) ind60 dem60 dem65 total alpha 0.9023348 0.8587945 0.8827394 0.9149416 omega 0.9437375 0.8375193 0.8556193 0.9134989 omega2 0.9437375 0.8375193 0.8556193 0.9134989 omega3 0.9436896 0.8411801 0.8575540 0.9192058 avevar 0.8588015 0.5996707 0.6408647 0.6320779
Exporting lavaan output to tables
Lavaan produces a lot of potential pieces of output, but I suggest that at minimum that you export the above model summary, confidence intervals and goodness of fit indicators as part of your final tables:
htmlTable(txtRound(parameterEstimates(fit, ci = TRUE, level = 0.95), digits=3, excl.cols=c(1,3)), align="l|r|r|r|r|r|r|r|r|r")
lhs | op | rhs | est | se | z | pvalue | ci.lower | ci.upper | |
---|---|---|---|---|---|---|---|---|---|
1 | ind60 | =~ | x1 | 1.000 | 0.000 | 1.000 | 1.000 | ||
2 | ind60 | =~ | x2 | 2.180 | 0.139 | 15.742 | 0.000 | 1.909 | 2.452 |
3 | ind60 | =~ | x3 | 1.819 | 0.152 | 11.967 | 0.000 | 1.521 | 2.116 |
4 | dem60 | =~ | y1 | 1.000 | 0.000 | 1.000 | 1.000 | ||
5 | dem60 | =~ | y2 | 1.257 | 0.182 | 6.889 | 5.636 | 0.899 | 1.614 |
6 | dem60 | =~ | y3 | 1.058 | 0.151 | 6.987 | 2.808 | 0.761 | 1.354 |
7 | dem60 | =~ | y4 | 1.265 | 0.145 | 8.722 | 0.000 | 0.981 | 1.549 |
8 | dem65 | =~ | y5 | 1.000 | 0.000 | 1.000 | 1.000 | ||
9 | dem65 | =~ | y6 | 1.186 | 0.169 | 7.024 | 2.159 | 0.855 | 1.517 |
10 | dem65 | =~ | y7 | 1.280 | 0.160 | 8.002 | 1.332 | 0.966 | 1.593 |
11 | dem65 | =~ | y8 | 1.266 | 0.158 | 8.007 | 1.110 | 0.956 | 1.576 |
12 | dem60 | ~ | ind60 | 1.483 | 0.399 | 3.715 | 2.029 | 0.701 | 2.265 |
13 | dem65 | ~ | ind60 | 0.572 | 0.221 | 2.586 | 9.707 | 0.139 | 1.006 |
14 | dem65 | ~ | dem60 | 0.837 | 0.098 | 8.514 | 0.000 | 0.645 | 1.030 |
15 | y1 | ~~ | y5 | 0.624 | 0.358 | 1.741 | 8.176 | -0.079 | 1.326 |
16 | y2 | ~~ | y4 | 1.313 | 0.702 | 1.871 | 6.140 | -0.063 | 2.689 |
17 | y2 | ~~ | y6 | 2.153 | 0.734 | 2.934 | 3.347 | 0.715 | 3.591 |
18 | y3 | ~~ | y7 | 0.795 | 0.608 | 1.308 | 1.908 | -0.396 | 1.986 |
19 | y4 | ~~ | y8 | 0.348 | 0.442 | 0.787 | 4.310 | -0.519 | 1.215 |
20 | y6 | ~~ | y8 | 1.356 | 0.568 | 2.386 | 1.701 | 0.242 | 2.470 |
21 | x1 | ~~ | x1 | 0.082 | 0.019 | 4.184 | 2.861 | 0.043 | 0.120 |
22 | x2 | ~~ | x2 | 0.120 | 0.070 | 1.718 | 8.573 | -0.017 | 0.256 |
23 | x3 | ~~ | x3 | 0.467 | 0.090 | 5.177 | 2.260 | 0.290 | 0.643 |
24 | y1 | ~~ | y1 | 1.891 | 0.444 | 4.256 | 2.083 | 1.020 | 2.762 |
25 | y2 | ~~ | y2 | 7.373 | 1.374 | 5.366 | 8.032 | 4.680 | 10.066 |
26 | y3 | ~~ | y3 | 5.067 | 0.952 | 5.324 | 1.012 | 3.202 | 6.933 |
27 | y4 | ~~ | y4 | 3.148 | 0.739 | 4.261 | 2.036 | 1.700 | 4.596 |
28 | y5 | ~~ | y5 | 2.351 | 0.480 | 4.895 | 9.810 | 1.410 | 3.292 |
29 | y6 | ~~ | y6 | 4.954 | 0.914 | 5.419 | 6.006 | 3.162 | 6.746 |
30 | y7 | ~~ | y7 | 3.431 | 0.713 | 4.814 | 1.482 | 2.034 | 4.829 |
31 | y8 | ~~ | y8 | 3.254 | 0.695 | 4.685 | 2.802 | 1.893 | 4.615 |
32 | ind60 | ~~ | ind60 | 0.448 | 0.087 | 5.173 | 2.306 | 0.279 | 0.618 |
33 | dem60 | ~~ | dem60 | 3.956 | 0.921 | 4.295 | 1.751 | 2.151 | 5.762 |
34 | dem65 | ~~ | dem65 | 0.172 | 0.215 | 0.803 | 4.220 | -0.249 | 0.593 |
htmlTable(txtRound(reliability(fit),3), align="l|r|r|r|r")
ind60 | dem60 | dem65 | total | |
---|---|---|---|---|
alpha | 0.902 | 0.859 | 0.883 | 0.915 |
omega | 0.944 | 0.838 | 0.856 | 0.913 |
omega2 | 0.944 | 0.838 | 0.856 | 0.913 |
omega3 | 0.944 | 0.841 | 0.858 | 0.919 |
avevar | 0.859 | 0.600 | 0.641 | 0.632 |
There are other packages to convert R output to tables such as knitr
and xtables
that can also export to \(\rm\LaTeX\).
Creating a diagram for the model in R
Now let's get on to producing the exciting part of an SEM analysis - the model diagram. This code might not be good-looking but the result is!
svg("sem_r/sem_example_2.svg") semPaths(fit, style="lisrel", whatLabels = "std", edge.label.cex = .6, node.label.cex = .6, label.prop=0.9, edge.label.color = "black", rotation = 4, equalizeManifests = FALSE, optimizeLatRes = TRUE, node.width = 1.5, edge.width = 0.5, shapeMan = "rectangle", shapeLat = "ellipse", shapeInt = "triangle", sizeMan = 4, sizeInt = 2, sizeLat = 4, curve=2, unCol = "#070b8c") dev.off()
Appendix
This is the TikZ (LaTeX) code to obtain the first diagram.
\documentclass[border=3mm]{standalone} \usepackage[applemac]{inputenc} \usepackage[protrusion=true,expansion=true]{microtype} \usepackage[bb=lucida,bbscaled=1,cal=boondoxo]{mathalfa} \usepackage[stdmathitalics=true,math-style=iso,lucidasmallscale=true,romanfamily=bright]{lucimatx} \usepackage{tikz} \usetikzlibrary{arrows,positioning,calc} \newcommand{\at}{\makeatletter @\makeatother} \begin{document} \begin{tikzpicture}[auto,node distance=.5cm, latent/.style={fill=red!20,circle,draw, thick,inner sep=0pt,minimum size=8mm,align=center}, observed/.style={fill=blue!20,rectangle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center}, error/.style={fill=yellow!20,circle,draw, thick,inner sep=0pt,minimum width=8mm,minimum height=8mm,align=center}, paths/.style={->, thick, >=stealth'}, paths2/.style={<-, thick, >=stealth'}, twopaths/.style={<->, thick, >=stealth'} ] %Define observed variables \node [latent] (z2) at (0,0) {$z_2$}; \node [latent] (z3) [below=7cm of z2] {$z_3$}; \node [error] (ez2) [above=0.5cm of z2] {$\zeta_2$}; \node [error] (ez3) [below=0.5cm of z3] {$\zeta_3$}; %%% \node [observed] (y2) [above left=0.2cm and 1.8cm of z2] {$y_2$}; \node [observed] (y1) [above=1cm of y2] {$y_1$}; \node [observed] (y3) [below=1cm of y2] {$y_3$}; \node [observed] (y4) [below=1cm of y3] {$y_4$}; \node [error] (ey1) [left=0.5cm of y1] {$\varepsilon_1$}; \node [error] (ey2) [left=0.5cm of y2] {$\varepsilon_2$}; \node [error] (ey3) [left=0.5cm of y3] {$\varepsilon_3$}; \node [error] (ey4) [left=0.5cm of y4] {$\varepsilon_4$}; %%% \node [observed] (y6) [above left=0.2cm and 1.8cm of z3] {$y_6$}; \node [observed] (y5) [above=1cm of y6] {$y_5$}; \node [observed] (y7) [below=1cm of y6] {$y_7$}; \node [observed] (y8) [below=1cm of y7] {$y_8$}; \node [error] (ey5) [left=0.5cm of y5] {$\varepsilon_5$}; \node [error] (ey6) [left=0.5cm of y6] {$\varepsilon_6$}; \node [error] (ey7) [left=0.5cm of y7] {$\varepsilon_7$}; \node [error] (ey8) [left=0.5cm of y8] {$\varepsilon_8$}; %%% \node [latent] (z1) [right=3cm of z2] {$z_1$}; \node [error] (ez1) [above=0.5cm of z1] {$\zeta_1$}; %%% \node [observed] (x2) [right=1.8cm of z1] {$x_2$}; \node [observed] (x1) [above=of x2] {$x_1$}; \node [observed] (x3) [below=of x2] {$x_3$}; \node [error] (ex1) [right=0.5cm of x1] {$\delta_1$}; \node [error] (ex2) [right=0.5cm of x2] {$\delta_2$}; \node [error] (ex3) [right=0.5cm of x3] {$\delta_3$}; % Draw paths form latent to observed variables \foreach \all in {y1,y2,y3,y4}{ \draw [paths] (z2.west) to node {} (\all.east); } \draw [paths] (y1.west) to (ey1.east); \draw [paths] (y2.west) to (ey2.east); \draw [paths] (y3.west) to (ey3.east); \draw [paths] (y4.west) to (ey4.east); %%% \foreach \all in {y5,y6,y7,y8}{ \draw [paths] (z3.west) to node {} (\all.east); } \draw [paths] (y5.west) to (ey5.east); \draw [paths] (y6.west) to (ey6.east); \draw [paths] (y7.west) to (ey7.east); \draw [paths] (y8.west) to (ey8.east); %%% \foreach \all in {x1,x2,x3}{ \draw [paths] (z1.east) to node {} (\all.west); } \draw [paths] (x1.east) to (ex1.west); \draw [paths] (x2.east) to (ex2.west); \draw [paths] (x3.east) to (ex3.west); %%% \draw [paths] (z1.north) to (ez1.south); \draw [paths] (z2.north) to (ez2.south); \draw [paths] (z3.south) to (ez3.north); %%% \draw [paths] (z2.south) to (z3.north); \draw [paths] (z1.west) to (z2.east); \draw [paths] (z1.west) to (z3.north); %%% \draw [twopaths] (y1.south west) to [bend right=90] (y5.north west); \draw [twopaths] (y2.south west) to [bend right=90] (y6.north west); \draw [twopaths] (y3.south west) to [bend right=90] (y7.north west); \draw [twopaths] (y4.south west) to [bend right=90] (y8.north west); \draw [twopaths] (y2.south west) to [out=180,in=90] ($(ey3)+(-1,0)$) to [out=-90, in=-180] (y4.north west); \draw [twopaths] (y6.south west) to [out=180,in=90] ($(ey7)+(-1,0)$) to [out=-90, in=-180] (y8.north west); \end{tikzpicture} \end{document}
and then I ran pdf2svg sem_example.pdf sem_example.svg
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.