[This article was first published on pacha.dev/blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The book can be downloaded for free but you will need a Leanpub account, same if you buy it.
The Hitchhiker’s Guide to Linear Models is finally complete. It took me a while to finish it but I’m happy with the result. I hope you enjoy it as much as I did writing it.
The Github repository contains the code for the book so that the readers can avoid copy and paste from the PDF.
Table of contents:
Contents |
|
|
|
Preface | i |
|
|
1 R Setup | 1 |
1.1 R and Rstudio | 1 |
1.2 Installing R | 1 |
1.2.1 Windows and Mac | 1 |
1.2.2 Linux | 1 |
1.3 Installing RStudio | 2 |
1.3.1 Windows and Mac | 2 |
1.3.2 Linux | 2 |
1.4 Installing R Packages. | 2 |
1.5 Changing RStudio colors and | 4 |
1.6 Installing Quarto | 4 |
1.6.1 Windows and Mac | 4 |
1.6.2 Linux1 | 4 |
|
|
2 Linear algebra review | 5 |
2.1 Using R as a calculator | 5 |
2.2 System of linear equations | 5 |
2.3 Matrix | 5 |
2.4 Transpose matrix | 6 |
2.5 Matrix multiplication | 6 |
2.6 Matrix representation of a system of linear equations | 6 |
2.7 Identity matrix | 7 |
2.8 Inverse matrix | 7 |
2.9 Solving systems of linear equations | 7 |
|
|
3 Statistics review |
|
3.1 Using R as a calculator | 11 |
3.2 Data and dataset | 11 |
3.3 Summation | 11 |
3.4 Probability | 11 |
3.5 Descriptive statistics | 13 |
3.5.1 Mean | 13 |
3.5.2 Variance | 13 |
3.5.3 Standard deviation | 14 |
3.5.4 Covariance | 15 |
3.5.5 Correlation | 16 |
3.6 Distributions | 20 |
3.6.1 Normal distribution | 20 |
3.6.2 Poisson distribution | 22 |
3.6.3 Student’s t-distribution | 23 |
3.6.4 Computing probabilities with the normal distribution | 24 |
3.6.5 Computing probabilities with the Poisson distribution | 27 |
3.6.6 Computing probabilities with the t-distribution | 28 |
3.7 Sample size | 29 |
|
|
4 Recommended workflow | 30 |
4.1 Creating projects | 30 |
4.2 Creating scripts | 30 |
4.3 Creating notebooks | 32 |
4.4 Organizing code sections | 33 |
4.5 Customizing notebooks’ output | 34 |
|
|
5 Read, Manipulate, and Plot Data | 35 |
5.1 The datasauRus dataset in R format. | 35 |
5.2 The Quality of Government dataset in CSV format. | 40 |
5.3 The Quality of Government dataset in SAV (SPSS) format | 44 |
5.4 The Quality of Government dataset in DTA (Stata) format | 48 |
5.5 The Freedom House dataset in XLSX (Excel) format | 50 |
|
|
6 Linear Model with One Explanatory Variable | 60 |
6.1 Model specification | 60 |
6.2 The Galton dataset | 64 |
6.3 A word of caution about Galton’s work | 64 |
6.4 Loading the Galton dataset | 65 |
6.5 Estimating linear models’ coefficients | 66 |
6.5.1 Linear model as correlation | 66 |
6.5.2 Linear model as matrix multiplication | 67 |
6.5.3 Relation between correlation and matrix multiplication | 71 |
6.5.4 Computational note | 75 |
6.6 Logarithmic transformations | 75 |
6.7 Plotting model results | 76 |
6.8 Linear model does not equal straight line | 81 |
6.9 Transforming variables | 85 |
6.10 Regression with weights | 89 |
|
|
7 Linear Model with Multiple Explanatory Variables | 91 |
7.1 Model specification | 91 |
7.2 Life expectancy, GDP and well-being in the Quality of Government dataset | 94 |
7.3 Estimating linear models’ coefficients | 96 |
7.4 Model accuracy | 103 |
7.4.1 Root Mean Squared Error and Mean Absolute Error | 103 |
7.4.2 RMSE and MAE interpretation | 104 |
7.5 Model summary | 107 |
7.5.1 Coefficient’s standard error | 107 |
7.5.2 Coefficient’s t-statistic | 108 |
7.5.3 Coefficient’s p-value | 108 |
7.5.4 Residual standard error | 109 |
7.5.5 Model’s multiple R-squared (or unadjusted R-squared) | 109 |
7.5.6 Model’s adjusted R-squared | 110 |
7.5.7 Model’s F-statistic | 111 |
7.6 Error’s assumptions | 111 |
7.6.1 Error’s normality | 112 |
7.6.2 Error’s homoscedasticity (homogeneous variance) | 113 |
|
|
8 Linear Model with Binary and Categorical Explanatory Variables | 114 |
8.1 Model specification with binary variables | 114 |
8.1.1 ANOVA is a particular case of a linear model with binary variables | 114 |
8.1.2 Corruption and popular vote in the Quality of Government dataset | 114 |
8.1.3 Estimating a linear model and ANOVA with one predictor and two categories | 116 |
8.1.4 Corruption and regime type in the Quality of Government dataset | 118 |
8.1.5 Estimating a linear model and ANOVA with one predictor and multiple categories | 120 |
8.1.6 Estimating a linear model with continuous and categorical predictors | 126 |
8.2 Model specification with binary interactions | 128 |
8.2.1 Corruption and interaction variables in the Quality of Government dataset | 128 |
8.2.2 Estimating a linear model with binary interactions | 131 |
8.2.3 Confidence intervals with binary interactions | 133 |
8.3 Model specification with categorical interactions | 136 |
8.3.1 Estimating a linear model with categorical interactions | 136 |
8.3.2 Confidence intervals with categorical interactions | 137 |
|
|
9 Linear Model with Fixed Effects | 140 |
9.1 Year fixed effects | 140 |
9.1.1 Model specification | 140 |
9.1.2 Corruption and popular vote in the Quality of Government dataset | 140 |
9.1.3 Estimating year fixed effects’ coefficients | 142 |
9.2 Country fixed effects | 145 |
9.2.1 Model specification | 145 |
9.2.2 Corruption and popular vote in the Quality of Government dataset | 145 |
9.2.3 Estimating country-time fixed effects’ coefficients | 145 |
9.3 Country-year fixed effects | 148 |
9.3.1 Model specification | 148 |
9.3.2 Corruption and popular vote in the Quality of Government dataset | 149 |
9.3.3 Estimating country-time fixed effects’ coefficients | 149 |
|
|
10 Generalized Linear Model with One Explanatory Variable | 152 |
10.1 Model specification | 152 |
10.2 Model families. | 152 |
10.2.1 Gaussian model | 153 |
10.2.2 Poisson model | 153 |
10.2.3 Quasi-Poisson model | 154 |
10.2.4 Binomial model (or logit model) | 157 |
|
|
11 Generalized Linear Model with Multiple Explanatory Variables | 165 |
11.1 Obtaining the original codes and data | 165 |
11.2 Loading the original data | 165 |
11.3 Ordinary Least Squares | 166 |
11.4 Poisson Pseudo Maximum Likelihood | 167 |
11.5 Tobit | 169 |
11.6 Reporting multiple models | 170 |
|
|
References | 172 |
Don’t panic!
To leave a comment for the author, please follow the link and comment on their blog: pacha.dev/blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.