A Structural Model of the World Happiness Report
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
— Albert Einstein
Introduction
Structural equation modelling is a statistical technique used mostly in behavioral and cognitive sciences to see how some selected factors affect a latent variable. A latent variable is a variable that can’t be observed but can be inferred from other variables. This post is going to use confirmatory factor analysis (CFA); a sub term in SEM to see if happiness can be predicted from the world happiness index variables.
The World Happiness Report data
The data for the world happiness report was sourced from kaggle and can be downloaded here. The data consists of 150 countries and variables: happiness score, GDP, social support, life expectancy, freedom, generosity and corruption. All variables are continuous variables and there is absence of missing values in the dataset.
Load the dataset
#load the libraries library(tidyverse) whi_data <- readr::read_csv("C:/Users/Adejumo/Downloads/2021.csv") %>% #select required columns select(c(1,3,7:12)) #change variable names whi_data <- whi_data %>% mutate(score = `Ladder score`, GDP = `Logged GDP per capita`, support = `Social support`, Life_exp = `Healthy life expectancy`, freedom = `Freedom to make life choices`, corruption = `Perceptions of corruption`, .keep = "unused") #top 6 rows head(whi_data) ## # A tibble: 6 x 8 ## `Country name` Generosity score GDP support Life_exp freedom corruption ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Finland -0.098 7.84 10.8 0.954 72 0.949 0.186 ## 2 Denmark 0.03 7.62 10.9 0.954 72.7 0.946 0.179 ## 3 Switzerland 0.025 7.57 11.1 0.942 74.4 0.919 0.292 ## 4 Iceland 0.16 7.55 10.9 0.983 73 0.955 0.673 ## 5 Netherlands 0.175 7.46 10.9 0.942 72.4 0.913 0.338 ## 6 Norway 0.093 7.39 11.1 0.954 73.3 0.96 0.27
Measurement Model
The measurement model enables us to know factors that will give us a good fit. This allows us to know the factors that have a large amount of effect on happiness before fitting the SEM. In the measurement model, a rule of thumb is that factors with factor loading less than 0.5 have a weak effect on the variable and hence they should be dropped. To perform a structural equation modelling in R we need the “lavaan” package and the “semptools” package for plotting the Directed Acyclic Graph (DAG) showing the relationship paths.
#load libraries library(lavaan) library(semPlot) #constructing the measurement model mea_model <- "Happiness = ~score + Generosity + GDP + support + Life_exp + freedom + corruption" #fitting the measurement model mea_fit <- sem(mea_model, data = whi_data) #summary statistics of the measurement model summary(mea_fit, fit.measures = TRUE) ## lavaan 0.6-9 ended normally after 59 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 14 ## ## Number of observations 149 ## ## Model Test User Model: ## ## Test statistic 80.437 ## Degrees of freedom 14 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 667.346 ## Degrees of freedom 21 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.897 ## Tucker-Lewis Index (TLI) 0.846 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -315.326 ## Loglikelihood unrestricted model (H1) -275.107 ## ## Akaike (AIC) 658.652 ## Bayesian (BIC) 700.707 ## Sample-size adjusted Bayesian (BIC) 656.401 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.178 ## 90 Percent confidence interval - lower 0.142 ## 90 Percent confidence interval - upper 0.217 ## P-value RMSEA <= 0.05 0.000 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.087 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## Happiness =~ ## score 1.000 ## Generosity -0.022 0.014 -1.659 0.097 ## GDP 1.154 0.069 16.618 0.000 ## support 0.103 0.008 13.455 0.000 ## Life_exp 6.522 0.418 15.613 0.000 ## freedom 0.066 0.009 7.157 0.000 ## corruption -0.075 0.015 -4.869 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .score 0.279 0.040 6.950 0.000 ## .Generosity 0.022 0.003 8.621 0.000 ## .GDP 0.179 0.035 5.130 0.000 ## .support 0.004 0.001 7.409 0.000 ## .Life_exp 8.534 1.366 6.247 0.000 ## .freedom 0.009 0.001 8.417 0.000 ## .corruption 0.027 0.003 8.539 0.000 ## Happiness 0.867 0.131 6.623 0.000
From the summary statistics above, we are able to see that the model is close almost a good fit from the model fit indices: CFI = 0.897, TLI = 0.846, RMSEA = 0.178 and SRMEA = 0.087. A model is said to be a good fit if CFI and TLI are greater than 0.95 and SRMR and RMSEA are in the range 0.05-1.00. The DAG diagram below will let us know which factors to remove to ensure a good model fit.
semPaths(mea_fit, "par", edge.label.cex = 1.2, fade = FALSE, edge.label.postition = 1, edge.label.font = 70, edge.label.by = TRUE, layout = "tree", sizeMan = 10)
From the diagram above, support and freedom have a factor loading less than 0.5 signifying that they have less impact on happiness. To fit the structural equation model, the variables support and freedom will have to be removed. We leave the variables corruption and generosity due to their negative relationship.
The Structural Equation Model
Now that we know the variables that will contribute more effect to our model, we can now fit the SEM model.
#constructing the measurement model sem_model <- "Happiness = ~score + Generosity + GDP + Life_exp + corruption" #fitting the measurement model sem_fit <- sem(sem_model, data = whi_data) #summary statistics of the measurement model summary(sem_fit, fit.measures = TRUE) ## lavaan 0.6-9 ended normally after 30 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 10 ## ## Number of observations 149 ## ## Model Test User Model: ## ## Test statistic 23.651 ## Degrees of freedom 5 ## P-value (Chi-square) 0.000 ## ## Model Test Baseline Model: ## ## Test statistic 409.546 ## Degrees of freedom 10 ## P-value 0.000 ## ## User Model versus Baseline Model: ## ## Comparative Fit Index (CFI) 0.953 ## Tucker-Lewis Index (TLI) 0.907 ## ## Loglikelihood and Information Criteria: ## ## Loglikelihood user model (H0) -640.835 ## Loglikelihood unrestricted model (H1) -629.010 ## ## Akaike (AIC) 1301.671 ## Bayesian (BIC) 1331.710 ## Sample-size adjusted Bayesian (BIC) 1300.063 ## ## Root Mean Square Error of Approximation: ## ## RMSEA 0.158 ## 90 Percent confidence interval - lower 0.098 ## 90 Percent confidence interval - upper 0.225 ## P-value RMSEA <= 0.05 0.003 ## ## Standardized Root Mean Square Residual: ## ## SRMR 0.072 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) ## Happiness =~ ## score 1.000 ## Generosity -0.027 0.014 -1.921 0.055 ## GDP 1.200 0.079 15.189 0.000 ## Life_exp 6.844 0.462 14.799 0.000 ## corruption -0.078 0.016 -4.910 0.000 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) ## .score 0.332 0.047 7.114 0.000 ## .Generosity 0.022 0.003 8.616 0.000 ## .GDP 0.162 0.042 3.896 0.000 ## .Life_exp 7.316 1.479 4.948 0.000 ## .corruption 0.027 0.003 8.524 0.000 ## Happiness 0.813 0.130 6.270 0.000
Though not all model fit parameters were met, but we can not ignore the fact that the CFI = 0.953 and the SRMR = 0.072. The final structural equation DAG diagram is given below:
semPaths(sem_fit, "par", edge.label.cex = 1.2, fade = FALSE, edge.label.postition = 1, edge.label.font = 70, edge.label.by = TRUE, layout = "tree", sizeMan = 10)
Conclusion
From the analysis above, we can now see that countries with high life expectancy, GDP and happiness score are more likely to be happy. Although I did not expect generosity to be negatively related with happiness but we are in a era where people take advantage of generous people and trust are broken, this might be a possible explanation for such relationship between generosity and happiness. Also as expected a country that is corrupt will not have happy individuals.
Thanks for your time as you read this analysis, though there is more to SEM than just this few paragraphs I have written in this post.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.