Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Check out my:
Source: http://archive.ics.uci.edu/ml/datasets/Student+Performance
This project is based upon two datasets of the academic performance of Portuguese students in two different classes: Math and Portuguese. Initially, I show the simplicity of predicting student performance using linear regression. Later, I show that it is still possible, yet more difficult, to predict the final grade without Period 1 and Period 2 grades but we we learn from those predictions provides much deeper insight. I ask deeper questions about the mathematical structure of student performance and potential indicators that can be used for early support and intervention.
Preparation¶
Load R and packages.
<span class="o">%</span><span class="k">load_ext</span> <span class="n">rpy2</span><span class="o">.</span><span class="n">ipython</span>
<span class="o">%%</span>R suppressPackageStartupMessages<span class="p">(</span>library<span class="p">(</span>ggplot2<span class="p">))</span> suppressPackageStartupMessages<span class="p">(</span>library<span class="p">(</span>dplyr<span class="p">))</span> suppressPackageStartupMessages<span class="p">(</span>library<span class="p">(</span>caret<span class="p">))</span> suppressPackageStartupMessages<span class="p">(</span>library<span class="p">(</span>gridExtra<span class="p">))</span> suppressPackageStartupMessages<span class="p">(</span>library<span class="p">(</span>MASS<span class="p">))</span> suppressPackageStartupMessages<span class="p">(</span>library<span class="p">(</span>leaps<span class="p">))</span> suppressPackageStartupMessages<span class="p">(</span>library<span class="p">(</span>relaimpo<span class="p">))</span> suppressPackageStartupMessages<span class="p">(</span>library<span class="p">(</span>mgcv<span class="p">))</span>
Read in data.
<span class="o">%%</span>R student.mat <span class="o"><-</span> read.csv<span class="p">(</span><span class="s">"student-mat.csv"</span><span class="p">,</span>sep<span class="o">=</span><span class="s">";"</span><span class="p">)</span> student.por <span class="o"><-</span> read.csv<span class="p">(</span><span class="s">"student-por.csv"</span><span class="p">,</span>sep<span class="o">=</span><span class="s">";"</span><span class="p">)</span> head<span class="p">(</span>student.mat<span class="p">)</span>
school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason 1 GP F 18 U GT3 A 4 4 at_home teacher course 2 GP F 17 U GT3 T 1 1 at_home other course 3 GP F 15 U LE3 T 1 1 at_home other other 4 GP F 15 U GT3 T 4 2 health services home 5 GP F 16 U GT3 T 3 3 other other home 6 GP M 16 U LE3 T 4 3 services other reputation guardian traveltime studytime failures schoolsup famsup paid activities 1 mother 2 2 0 yes no no no 2 father 1 2 0 no yes no no 3 mother 1 2 3 yes no yes no 4 mother 1 3 0 no yes yes yes 5 father 1 2 0 no yes yes no 6 mother 1 2 0 no yes yes yes nursery higher internet romantic famrel freetime goout Dalc Walc health 1 yes yes no no 4 3 4 1 1 3 2 no yes yes no 5 3 3 1 1 3 3 yes yes yes no 4 3 2 2 3 3 4 yes yes yes yes 3 2 2 1 1 5 5 yes yes no no 4 3 2 1 2 5 6 yes yes yes no 5 4 2 1 2 5 absences G1 G2 G3 1 6 5 6 6 2 4 5 5 6 3 10 7 8 10 4 2 15 14 15 5 4 6 10 10 6 10 15 15 15
Linear Model¶
For determining the best linear model, we will use student.mat as a training set and student.por as a test set.
<span class="o">%%</span>R train <span class="o"><-</span> student.mat test <span class="o"><-</span> student.por
Saturated Model¶
Let’s fit a linear model to all of the variables. The saturated model will overfit the data, but it will provide a control that can be used to test against.
<span class="o">%%</span>R fit <span class="o"><-</span> lm<span class="p">(</span>G3 <span class="o">~</span> .<span class="p">,</span> train<span class="p">)</span>
Compare Adjusted R2, BIC, and Mallow’s CP With Best Subsets¶
5 variables give the lowest BIC and Mallow’s CP while providing an optimal Adjusted R2.
<span class="o">%%</span>R subs <span class="o"><-</span> regsubsets<span class="p">(</span>G3 <span class="o">~</span> .<span class="p">,</span> data <span class="o">=</span> train<span class="p">)</span> df <span class="o"><-</span> data.frame<span class="p">(</span>est <span class="o">=</span> c<span class="p">(</span>summary<span class="p">(</span>subs<span class="p">)</span><span class="o">$</span>adjr2<span class="p">,</span> summary<span class="p">(</span>subs<span class="p">)</span><span class="o">$</span>cp<span class="p">,</span> summary<span class="p">(</span>subs<span class="p">)</span><span class="o">$</span>bic<span class="p">),</span> x <span class="o">=</span> rep<span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">8</span><span class="p">,</span> <span class="m">33</span><span class="p">),</span> type <span class="o">=</span> rep<span class="p">(</span>c<span class="p">(</span><span class="s">"adjr2"</span><span class="p">,</span> <span class="s">"cp"</span><span class="p">,</span> <span class="s">"bic"</span><span class="p">),</span> each <span class="o">=</span> <span class="m">8</span><span class="p">))</span> qplot<span class="p">(</span>x<span class="p">,</span> est<span class="p">,</span> data <span class="o">=</span> df<span class="p">,</span> geom <span class="o">=</span> <span class="s">"line"</span><span class="p">)</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> facet_grid<span class="p">(</span>type <span class="o">~</span> .<span class="p">,</span> scales <span class="o">=</span> <span class="s">"free_y"</span><span class="p">)</span>
From the summary, we need to pick the top 5 variables. G1, G2, absences, and famrel will be the first four and the fifth will either be age or activities.
<span class="o">%%</span>R fit <span class="o"><-</span> lm<span class="p">(</span>formula <span class="o">=</span> G3 <span class="o">~</span> .<span class="p">,</span> data <span class="o">=</span> train<span class="p">)</span> summary<span class="p">(</span>fit<span class="p">)</span>
Call: lm(formula = G3 ~ ., data = train) Residuals: Min 1Q Median 3Q Max -7.9339 -0.5532 0.2680 0.9689 4.6461 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.115488 2.116958 -0.527 0.598573 schoolMS 0.480742 0.366512 1.312 0.190485 sexM 0.174396 0.233588 0.747 0.455805 age -0.173302 0.100780 -1.720 0.086380 . addressU 0.104455 0.270791 0.386 0.699922 famsizeLE3 0.036512 0.226680 0.161 0.872128 PstatusT -0.127673 0.335626 -0.380 0.703875 Medu 0.129685 0.149999 0.865 0.387859 Fedu -0.133940 0.128768 -1.040 0.298974 Mjobhealth -0.146426 0.518491 -0.282 0.777796 Mjobother 0.074088 0.332044 0.223 0.823565 Mjobservices 0.046956 0.369587 0.127 0.898973 Mjobteacher -0.026276 0.481632 -0.055 0.956522 Fjobhealth 0.330948 0.666601 0.496 0.619871 Fjobother -0.083582 0.476796 -0.175 0.860945 Fjobservices -0.322142 0.493265 -0.653 0.514130 Fjobteacher -0.112364 0.601448 -0.187 0.851907 reasonhome -0.209183 0.256392 -0.816 0.415123 reasonother 0.307554 0.380214 0.809 0.419120 reasonreputation 0.129106 0.267254 0.483 0.629335 guardianmother 0.195741 0.252672 0.775 0.439046 guardianother 0.006565 0.463650 0.014 0.988710 traveltime 0.096994 0.157800 0.615 0.539170 studytime -0.104754 0.134814 -0.777 0.437667 failures -0.160539 0.161006 -0.997 0.319399 schoolsupyes 0.456448 0.319538 1.428 0.154043 famsupyes 0.176870 0.224204 0.789 0.430710 paidyes 0.075764 0.222100 0.341 0.733211 activitiesyes -0.346047 0.205938 -1.680 0.093774 . nurseryyes -0.222716 0.254184 -0.876 0.381518 higheryes 0.225921 0.500398 0.451 0.651919 internetyes -0.144462 0.287528 -0.502 0.615679 romanticyes -0.272008 0.219732 -1.238 0.216572 famrel 0.356876 0.114124 3.127 0.001912 ** freetime 0.047002 0.110209 0.426 0.670021 goout 0.012007 0.105230 0.114 0.909224 Dalc -0.185019 0.153124 -1.208 0.227741 Walc 0.176772 0.114943 1.538 0.124966 health 0.062995 0.074800 0.842 0.400259 absences 0.045879 0.013412 3.421 0.000698 *** G1 0.188847 0.062373 3.028 0.002645 ** G2 0.957330 0.053460 17.907 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.901 on 353 degrees of freedom Multiple R-squared: 0.8458, Adjusted R-squared: 0.8279 F-statistic: 47.21 on 41 and 353 DF, p-value: < 2.2e-16
ANOVA¶
The ANOVA test tells us that the best model is the one with age.
<span class="o">%%</span>R model1 <span class="o"><-</span> lm<span class="p">(</span>G3<span class="o">~</span> G1 <span class="o">+</span> G2 <span class="o">+</span> absences <span class="o">+</span> famrel <span class="o">+</span> age<span class="p">,</span> data <span class="o">=</span> train<span class="p">)</span> model2 <span class="o"><-</span> lm<span class="p">(</span>G3<span class="o">~</span> G1 <span class="o">+</span> G2 <span class="o">+</span> absences <span class="o">+</span> famrel <span class="o">+</span> activities<span class="p">,</span> data <span class="o">=</span> train<span class="p">)</span> anova<span class="p">(</span>fit<span class="p">,</span> model1<span class="p">,</span> model2<span class="p">)</span>
Analysis of Variance Table Model 1: G3 ~ school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + Fjob + reason + guardian + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health + absences + G1 + G2 Model 2: G3 ~ G1 + G2 + absences + famrel + age Model 3: G3 ~ G1 + G2 + absences + famrel + activities Res.Df RSS Df Sum of Sq F Pr(>F) 1 353 1275.5 2 389 1376.1 -36 -100.589 0.7733 0.8248 3 389 1391.4 0 -15.309
Test Set¶
Very quickly, we have an accurate model that did a great job predicting our test set. Notice the darker alpha areas snugly against the line.
We can visually compare the success of the final model versus the saturated model by graphing the predicted values versus the actual values. The line represents a perfect model.
Please note the outliers around the actual values of 0. I will go into more detail about this group later in this project.
<span class="o">%%</span>R <span class="c1">#Saturated Model</span> control.model <span class="o"><-</span> lm<span class="p">(</span>G3 <span class="o">~</span> .<span class="p">,</span> data <span class="o">=</span> test<span class="p">)</span> control.graph <span class="o"><-</span> qplot<span class="p">(</span>G3<span class="p">,</span> predict<span class="p">(</span>control.model<span class="p">),</span> data <span class="o">=</span> test<span class="p">,</span> geom <span class="o">=</span> <span class="s">"point"</span><span class="p">,</span> position <span class="o">=</span> <span class="s">"jitter"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.5</span><span class="p">,</span> main<span class="o">=</span><span class="s">"Saturated Model"</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">,</span> slope<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span> theme<span class="p">(</span>legend.position<span class="o">=</span><span class="s">"none"</span><span class="p">)</span> <span class="c1">#Final Model</span> final.model <span class="o"><-</span> lm<span class="p">(</span>G3<span class="o">~</span> G1 <span class="o">+</span> G2 <span class="o">+</span> absences <span class="o">+</span> famrel <span class="o">+</span> age<span class="p">,</span> data <span class="o">=</span> test<span class="p">)</span> final.graph <span class="o"><-</span> qplot<span class="p">(</span>G3<span class="p">,</span> predict<span class="p">(</span>final.model<span class="p">),</span> data <span class="o">=</span> test<span class="p">,</span> geom <span class="o">=</span> <span class="s">"point"</span><span class="p">,</span> position <span class="o">=</span> <span class="s">"jitter"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.5</span><span class="p">,</span> main<span class="o">=</span><span class="s">"Final Model"</span><span class="p">,</span> guide<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">,</span> slope<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span> theme<span class="p">(</span>legend.position<span class="o">=</span><span class="s">"none"</span><span class="p">)</span> grid.arrange<span class="p">(</span>control.graph<span class="p">,</span>final.graph<span class="p">,</span>nrow<span class="o">=</span><span class="m">2</span><span class="p">)</span>
Diagnostics¶
Overall, our model looks pretty good. The main issue with our model is the cluster when G3 is 0.
It affects the residuals at the lower end of our distribution.
<span class="o">%%</span>R plot<span class="p">(</span>final.model<span class="p">)</span>
The 0-Cluster¶
Upon further inspection of the data, it becomes obvious that this cluster most likely belongs to students who dropped the course.
- They have G1 and/or G2 grades but final grades of 0.
- There are no G1s of 0 but there are G2s with 0 value.
- The exploratory model predicts these students as scoring between 0 and 10 which would constitute failing grades.
As a result, we should drop these data points before continuing our analysis since they will not be useful for the question we are researching.
<span class="o">%%</span>R score0 <span class="o"><-</span> subset<span class="p">(</span>student.por<span class="p">,</span> G3<span class="o">==</span><span class="m">0</span><span class="p">)</span> score0
school sex age address famsize Pstatus Medu Fedu Mjob Fjob 164 GP M 18 U LE3 T 1 1 other other 441 MS M 16 U GT3 T 1 1 at_home services 520 MS M 16 R GT3 T 2 1 other services 564 MS M 17 U GT3 T 2 2 other other 568 MS M 18 R GT3 T 3 2 services other 584 MS F 18 R GT3 T 2 2 other other 587 MS F 17 U GT3 T 4 2 teacher services 598 MS F 18 R GT3 T 2 2 at_home other 604 MS F 18 R LE3 A 4 2 teacher other 606 MS F 19 U GT3 T 1 1 at_home services 611 MS F 19 R GT3 A 1 1 at_home at_home 627 MS F 18 R GT3 T 4 4 other teacher 638 MS M 18 R GT3 T 2 1 other other 640 MS M 19 R GT3 T 1 1 other services 641 MS M 18 R GT3 T 4 2 other other reason guardian traveltime studytime failures schoolsup famsup fatherd 164 course mother 1 1 2 no no no 441 home mother 2 2 0 no yes no 520 reputation mother 2 2 0 no no no 564 course mother 1 1 1 no no no 568 course mother 1 1 1 no no no 584 other mother 2 1 1 no no no 587 home mother 1 2 0 yes yes no 598 course mother 3 2 1 no no no 604 reputation mother 1 2 0 no no no 606 other father 2 1 1 no no no 611 course other 2 2 3 no yes no 627 other father 3 2 0 no yes no 638 other mother 2 1 0 no no no 640 other mother 2 1 1 no no no 641 home father 2 1 1 no no yes activities nursery higher internet romantic famrel freetime goout Dalc Walc 164 no yes no yes yes 2 3 5 2 5 441 yes yes yes no yes 5 4 5 4 5 520 yes yes yes yes no 5 2 1 1 1 564 yes yes yes no yes 1 2 1 2 3 568 no yes no yes no 2 3 1 2 2 584 no yes no yes yes 5 5 5 1 1 587 yes yes yes yes no 5 5 5 1 3 598 yes yes yes no yes 4 3 3 1 1 604 yes yes yes yes yes 5 3 1 1 1 606 no yes no no no 5 5 5 2 3 611 yes yes no no yes 3 5 4 1 4 627 no no yes yes yes 3 2 2 4 2 638 yes no yes yes yes 4 4 3 1 3 640 no yes yes no no 4 3 2 1 3 641 no yes yes no no 5 4 3 4 3 health absences G1 G2 G3 164 4 0 11 9 0 441 3 0 7 0 0 520 2 0 8 7 0 564 5 0 7 0 0 568 5 0 4 0 0 584 3 0 8 6 0 587 5 0 8 8 0 598 4 0 9 0 0 604 5 0 5 0 0 606 2 0 5 0 0 611 1 0 8 0 0 627 5 0 7 5 0 638 5 0 7 7 0 640 5 0 5 8 0 641 3 0 7 7 0
Final Model¶
Here is the final model for students who finish the course.
<span class="o">%%</span>R <span class="c1">#Final Model</span> test <span class="o"><-</span> subset<span class="p">(</span>train<span class="p">,</span> G3<span class="o">!=</span><span class="m">0</span><span class="p">)</span> final.model.no0 <span class="o"><-</span> lm<span class="p">(</span>G3<span class="o">~</span> G1 <span class="o">+</span> G2 <span class="o">+</span> absences <span class="o">+</span> famrel <span class="o">+</span> age<span class="p">,</span> data <span class="o">=</span> test<span class="p">)</span> qplot<span class="p">(</span>G3<span class="p">,</span> predict<span class="p">(</span>final.model.no0<span class="p">),</span> data <span class="o">=</span> test<span class="p">,</span> geom <span class="o">=</span> <span class="s">"point"</span><span class="p">,</span> position <span class="o">=</span> <span class="s">"jitter"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.5</span><span class="p">,</span> main<span class="o">=</span><span class="s">"Final Model"</span><span class="p">,</span> guide<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">,</span> slope<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span> theme<span class="p">(</span>legend.position<span class="o">=</span><span class="s">"none"</span><span class="p">)</span>
Deeper Questions and Analysis¶
Our model does a great job at predicting student success; however, there are deeper questions that this model doesn’t address. In particular, it doesn’t demonstrate how we can pick which students are most likely to fail classes at an early age when they lack the best predictors in this model.
As we’ve seen, the best predictors of success are current grades within the course (G1 and G2), age, quality of family relationships, and absences.
Current grades are already present once a problem exists.
Let’s try to see if we can determine what factors can be more useful at preventing student failure and promoting academic success.
Let’s start by looking at all the variables within a linear model, but remove our strongest indicators, G1 and G2, which overshadow other potential factors.
<span class="o">%%</span>R fit <span class="o"><-</span> lm<span class="p">(</span>G3 <span class="o">~</span> . <span class="o">-</span>G1 <span class="o">-</span>G2<span class="p">,</span> student.mat<span class="p">)</span>
Our predictions stop at 15 but actual scores rise until 20. Without G1 and G2, our model is unable to make predictions that are any higher.
A score of 15 shows a clear dividing line where the "potential" futures merge into current academic success. This line is important in that it can help us determine what deeper differences successful students have from their peers and also allows to create a definition of a "successful" student that we can use.
For this section, it becomes clear that two models will need to be analyzed: one for grades below 15 and another for grades above 15.
<span class="o">%%</span>R qplot<span class="p">(</span>G3<span class="p">,</span> predict<span class="p">(</span>fit<span class="p">),</span> data <span class="o">=</span> student.mat<span class="p">,</span> geom <span class="o">=</span> <span class="s">"point"</span><span class="p">,</span> position <span class="o">=</span> <span class="s">"jitter"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.8</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">,</span> slope<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span> theme<span class="p">(</span>legend.position<span class="o">=</span><span class="s">"none"</span><span class="p">)</span>
Breaking Up the Analysis¶
So far, the data has shown that it should be broken into three parts in order to analyze deeper predictors of future success.
Students who drop 1. The first isolates students who drop a course. Their final outcome is 0 even though they should have a higher predicted outcome. These students have predicted scores below 10.
Students who finish 2. Between 0 and 15, one set of predictors (one model) will be used to predict student outcomes. 3. Between 15 and 20, a different set of predictors (a different model) will be used.
<span class="o">%%</span>R <span class="c1">#Prep Data</span> score0 <span class="o"><-</span> subset<span class="p">(</span>student.mat<span class="p">,</span> G3<span class="o">==</span><span class="m">0</span><span class="p">)</span> score.no0 <span class="o"><-</span> subset<span class="p">(</span>student.mat<span class="p">,</span> G3<span class="o">!=</span><span class="m">0</span><span class="p">)</span> score14 <span class="o"><-</span> subset<span class="p">(</span>score.no0<span class="p">,</span> G3<span class="o"><</span><span class="m">15</span><span class="p">)</span> score15 <span class="o"><-</span> subset<span class="p">(</span>score.no0<span class="p">,</span> G3<span class="o">></span><span class="m">14</span><span class="p">)</span>
Students Above 15¶
Students in this group have 3 things that stand out:
1. All of them have parents that live together. 2. None of them have had past class failures. 3. All of them plan on seeking higher education.
<span class="o">%%</span>R score15
school sex age address famsize Pstatus Medu Fedu Mjob Fjob 4 GP F 15 U GT3 T 4 2 health services 6 GP M 16 U LE3 T 4 3 services other 9 GP M 15 U LE3 A 3 2 services other 10 GP M 15 U GT3 T 3 4 other other 15 GP M 15 U GT3 A 2 2 other other 21 GP M 15 U GT3 T 4 3 teacher other 22 GP M 15 U GT3 T 4 4 health health 23 GP M 16 U LE3 T 4 2 teacher other 28 GP M 15 U GT3 T 4 2 health services 32 GP M 15 U GT3 T 4 4 services services 33 GP M 15 R GT3 T 4 3 teacher at_home 35 GP M 16 U GT3 T 3 2 other other 37 GP M 15 U LE3 T 4 3 teacher services 38 GP M 16 R GT3 A 4 4 other teacher 43 GP M 15 U GT3 T 4 4 services teacher 48 GP M 16 U GT3 T 4 3 health services 57 GP F 15 U GT3 A 4 3 services services 58 GP M 15 U GT3 T 4 4 teacher health 60 GP F 16 U GT3 T 4 2 services other 66 GP F 16 U LE3 T 4 3 teacher services 70 GP F 15 R LE3 T 3 1 other other 71 GP M 16 U GT3 T 3 1 other other 84 GP M 15 U LE3 T 2 2 services services 92 GP F 15 U GT3 T 4 3 services other 97 GP M 16 R GT3 T 4 3 services other 102 GP M 16 U GT3 T 4 4 services teacher 105 GP M 15 U GT3 A 3 4 services other 108 GP M 16 U GT3 T 3 3 services other 110 GP F 16 U LE3 T 4 4 health health 111 GP M 15 U LE3 A 4 4 teacher teacher 114 GP M 15 U LE3 T 4 2 teacher other 116 GP M 16 U GT3 T 4 4 teacher teacher 121 GP F 15 U GT3 T 1 2 at_home services 122 GP M 15 U GT3 T 2 2 services services 130 GP M 16 R GT3 T 4 4 teacher teacher 140 GP F 15 U GT3 T 4 4 teacher teacher 159 GP M 16 R GT3 T 2 2 at_home other 168 GP F 16 U GT3 T 4 2 health services 172 GP M 16 U GT3 T 1 0 other other 183 GP F 17 U GT3 T 2 4 services services 188 GP M 16 U LE3 T 2 1 other other 196 GP F 17 U LE3 T 2 4 services services 197 GP M 17 U GT3 T 4 4 services teacher 199 GP F 17 U GT3 T 4 4 services teacher 201 GP F 16 U GT3 T 4 3 health other 216 GP F 17 U LE3 T 3 2 other other 223 GP F 16 U GT3 T 2 3 services teacher 227 GP F 17 U GT3 T 3 2 other other 246 GP M 16 U GT3 T 2 1 other other 250 GP M 16 U GT3 T 0 2 other other 261 GP F 18 U GT3 T 4 3 services other 266 GP M 18 R LE3 A 3 4 other other 287 GP F 18 U GT3 T 2 2 at_home at_home 290 GP M 18 U LE3 A 4 4 teacher teacher 292 GP F 17 U GT3 T 4 3 health services 294 GP F 17 R LE3 T 3 1 services other 300 GP M 18 U LE3 T 4 4 teacher teacher 304 GP F 17 U GT3 T 3 2 health health 307 GP M 20 U GT3 A 3 2 services other 324 GP F 17 U GT3 T 3 1 services services 325 GP F 17 U LE3 T 0 2 at_home at_home 327 GP M 17 U GT3 T 3 3 other services 336 GP F 17 U GT3 T 3 4 services other 339 GP F 18 U LE3 T 3 3 services services 343 GP M 18 U LE3 T 3 4 services other 347 GP M 18 R GT3 T 4 3 teacher services 349 GP F 17 U GT3 T 4 3 health other 360 MS F 18 U LE3 T 1 1 at_home services 364 MS F 17 U LE3 T 4 4 at_home at_home 375 MS F 18 R LE3 T 4 4 other other 377 MS F 20 U GT3 T 4 2 health other 379 MS F 18 U GT3 T 3 3 other other 392 MS M 17 U LE3 T 3 1 services services reason guardian traveltime studytime failures schoolsup famsup paid 4 home mother 1 3 0 no yes yes 6 reputation mother 1 2 0 no yes yes 9 home mother 1 2 0 no yes yes 10 home mother 1 2 0 no yes yes 15 home other 1 3 0 no yes no 21 reputation mother 1 2 0 no no no 22 other father 1 1 0 no yes yes 23 course mother 1 2 0 no no no 28 other mother 1 1 0 no no yes 32 reputation mother 2 2 0 no yes no 33 course mother 1 2 0 no yes no 35 home mother 1 1 0 no yes yes 37 home mother 1 3 0 no yes no 38 reputation mother 2 3 0 no yes no 43 course father 1 2 0 no yes no 48 reputation mother 1 4 0 no no no 57 reputation mother 1 2 0 no yes yes 58 reputation mother 1 2 0 no yes no 60 course mother 1 2 0 no yes no 66 course mother 3 2 0 no yes no 70 reputation father 2 4 0 no yes no 71 reputation father 2 4 0 no yes yes 84 home mother 2 2 0 no no yes 92 reputation mother 1 1 0 no no yes 97 reputation mother 2 1 0 yes yes no 102 other father 1 3 0 no yes no 105 course mother 1 2 0 no yes yes 108 home father 1 3 0 no yes no 110 other mother 1 3 0 no yes yes 111 course mother 1 1 0 no no no 114 course mother 1 1 0 no no no 116 course father 1 2 0 no yes no 121 course mother 1 2 0 no no no 122 home father 1 4 0 no yes yes 130 course mother 1 1 0 no no yes 140 course mother 2 1 0 no no no 159 course mother 3 1 0 no no no 168 home father 1 2 0 no no yes 172 reputation mother 2 2 0 no yes yes 183 reputation father 1 2 0 no yes no 188 course mother 1 2 0 no no yes 196 course father 1 2 0 no no no 197 home mother 1 1 0 no no no 199 home mother 2 1 1 no yes no 201 home mother 1 2 0 no yes no 216 reputation mother 2 2 0 no no yes 223 other mother 1 2 0 yes no no 227 course mother 1 2 0 no no no 246 course mother 3 1 0 no no no 250 other mother 1 1 0 no no yes 261 home father 1 2 0 no yes yes 266 reputation mother 2 2 0 no yes yes 287 other mother 1 3 0 no yes yes 290 reputation mother 1 2 0 no yes yes 292 reputation mother 1 3 0 no yes yes 294 reputation mother 2 4 0 no yes yes 300 home mother 1 1 0 no yes yes 304 reputation father 1 4 0 no yes yes 307 course other 1 1 0 no no no 324 course father 1 3 0 no yes no 325 home father 2 3 0 no no no 327 reputation mother 1 1 0 no no no 336 course mother 1 3 0 no no no 339 home mother 1 4 0 no yes no 343 home mother 1 2 0 no no no 347 course mother 1 3 0 no no no 349 reputation mother 1 3 0 no yes yes 360 course father 2 3 0 no no no 364 course mother 1 2 0 no yes yes 375 reputation mother 2 3 0 no no no 377 course other 2 3 2 no yes yes 379 home mother 1 2 0 no no yes 392 course mother 2 1 0 no no no activities nursery higher internet romantic famrel freetime goout Dalc Walc 4 yes yes yes yes yes 3 2 2 1 1 6 yes yes yes yes no 5 4 2 1 2 9 no yes yes yes no 4 2 2 1 1 10 yes yes yes yes no 5 5 1 1 1 15 no yes yes yes yes 4 5 2 1 1 21 no yes yes yes no 4 4 1 1 1 22 no yes yes yes no 5 4 2 1 1 23 yes yes yes yes no 4 5 1 1 3 28 no yes yes yes no 2 2 4 2 4 32 yes yes yes yes no 4 3 1 1 1 33 yes yes yes yes yes 4 5 2 1 1 35 no no yes yes no 5 4 3 1 1 37 yes yes yes yes no 5 4 3 1 1 38 yes yes yes yes yes 2 4 3 1 1 43 yes yes yes yes no 4 3 3 1 1 48 yes yes yes yes no 4 2 2 1 1 57 yes yes yes yes no 4 3 2 1 1 58 yes yes yes no no 3 2 2 1 1 60 no yes yes yes no 4 2 3 1 1 66 yes yes yes yes no 5 4 3 1 2 70 no no yes yes no 4 4 2 2 3 71 no yes yes yes no 4 3 2 1 1 84 yes yes yes yes no 5 3 3 1 3 92 yes yes yes yes no 4 5 5 1 3 97 yes no yes yes no 3 3 3 1 1 102 yes yes yes yes yes 4 4 3 1 1 105 yes yes yes yes no 5 4 4 1 1 108 yes yes yes yes no 5 3 3 1 1 110 yes yes yes yes yes 5 4 5 1 1 111 yes yes yes yes no 5 5 3 1 1 114 no yes yes yes no 3 5 2 1 1 116 yes yes yes yes no 5 4 4 1 2 121 no no yes yes no 3 2 3 1 2 122 yes yes yes yes no 5 5 4 1 2 130 yes yes yes yes no 3 5 5 2 5 140 yes yes yes yes no 4 3 2 1 1 159 no no yes no no 4 2 2 1 2 168 no yes yes yes yes 4 2 3 1 1 172 yes yes yes yes yes 4 3 2 1 1 183 yes yes yes no no 5 4 2 2 3 188 yes yes yes yes yes 4 2 3 1 2 196 yes yes yes yes yes 4 3 2 1 1 197 no yes yes yes no 5 2 3 1 2 199 no yes yes yes no 4 2 4 2 3 201 yes yes yes yes no 4 3 5 1 5 216 no yes yes yes no 4 4 4 1 3 223 no yes yes yes no 2 3 1 1 1 227 yes no yes yes no 5 3 4 1 3 246 no yes yes yes no 4 3 3 1 1 250 no no yes yes no 4 3 2 2 4 261 no yes yes yes yes 3 1 2 1 3 266 yes yes yes yes no 4 2 5 3 4 287 no yes yes yes no 4 3 3 1 2 290 yes yes yes yes no 5 4 3 1 1 292 no yes yes yes no 4 2 2 1 2 294 no yes yes no no 3 1 2 1 1 300 no yes yes yes yes 1 4 2 2 2 304 yes no yes yes no 5 2 2 1 2 307 yes yes yes no no 5 5 3 1 1 324 no no yes yes no 3 4 3 2 3 325 no yes yes yes no 3 3 3 2 3 327 yes no yes yes no 4 3 5 3 5 336 no yes yes yes no 4 4 5 1 3 339 no yes yes yes no 5 3 3 1 1 343 yes yes yes yes yes 4 3 3 1 3 347 no yes yes yes yes 5 3 2 1 2 349 yes yes yes yes yes 4 4 3 1 3 360 no yes yes yes no 5 3 2 1 1 364 yes yes yes yes yes 2 3 4 1 1 375 no yes yes yes no 5 4 4 1 1 377 no no yes yes yes 5 4 3 1 1 379 no yes yes yes yes 4 1 3 1 2 392 no no yes yes no 2 4 5 3 4 health absences G1 G2 G3 4 5 2 15 14 15 6 5 10 15 15 15 9 1 0 16 18 19 10 5 0 14 15 15 15 3 0 14 16 16 21 1 0 13 14 15 22 5 0 12 15 15 23 5 2 15 15 16 28 1 4 15 16 15 32 5 0 17 16 17 33 5 0 17 16 16 35 5 0 12 14 15 37 4 2 15 16 18 38 5 7 15 16 15 43 5 2 19 18 18 48 2 4 19 19 20 57 1 0 14 15 15 58 5 4 14 15 15 60 5 2 15 16 16 66 1 2 16 15 15 70 3 12 16 16 16 71 5 0 13 15 15 84 4 4 15 15 15 92 1 4 16 17 18 97 4 2 11 15 15 102 4 0 16 17 17 105 1 0 16 18 18 108 5 2 16 18 18 110 4 4 14 15 16 111 4 6 18 19 19 114 3 10 18 19 19 116 5 2 15 15 16 121 1 2 16 15 15 122 5 6 16 14 15 130 4 8 18 18 18 140 5 0 16 16 15 159 3 2 17 15 15 168 3 0 14 15 16 172 3 2 13 15 16 183 5 0 16 17 17 188 5 0 15 15 15 196 5 0 14 15 15 197 5 4 17 15 16 199 2 24 18 18 18 201 2 2 16 16 16 216 1 2 14 15 15 223 3 2 16 16 17 227 3 10 16 15 15 246 4 6 18 18 18 250 5 0 13 15 15 261 2 21 17 18 18 266 1 13 17 17 17 287 2 5 18 18 19 290 2 9 15 13 15 292 3 0 15 15 15 294 3 6 18 18 18 300 1 5 16 15 16 304 5 0 17 17 18 307 5 0 17 18 18 324 5 1 12 14 15 325 2 0 16 15 15 327 5 3 14 15 16 336 5 16 16 15 15 339 1 7 16 15 17 343 5 11 16 15 15 347 4 9 16 15 16 349 4 0 13 15 15 360 4 0 18 16 16 364 1 0 16 15 15 375 1 0 19 18 19 377 3 4 15 14 15 379 1 0 15 15 15 392 2 3 14 16 16
Students Below 14¶
Create a training and test set for this group.
<span class="o">%%</span>R set.seed<span class="p">(</span><span class="m">123</span><span class="p">)</span> inTraining <span class="o"><-</span> createDataPartition<span class="p">(</span>score14<span class="o">$</span>G3<span class="p">,</span> p <span class="o">=</span> <span class="m">.75</span><span class="p">,</span> list <span class="o">=</span> <span class="kc">FALSE</span><span class="p">)</span> training <span class="o"><-</span> score14<span class="p">[</span> inTraining<span class="p">,]</span> testing <span class="o"><-</span> score14<span class="p">[</span><span class="o">-</span>inTraining<span class="p">,]</span>
Saturated Model¶
Below is a general model with all of our variables using the training set. This can help determine which predictors are statistically significant.
<span class="o">%%</span>R saturated14 <span class="o"><-</span> lm<span class="p">(</span>G3 <span class="o">~</span> . <span class="o">-</span>G1 <span class="o">-</span>G2<span class="p">,</span> data <span class="o">=</span> training<span class="p">)</span> summary<span class="p">(</span>saturated14<span class="p">)</span>
Call: lm(formula = G3 ~ . - G1 - G2, data = training) Residuals: Min 1Q Median 3Q Max -5.0532 -1.2297 -0.0758 1.5029 5.1073 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.740459 3.406452 3.153 0.00190 ** schoolMS -0.669689 0.564845 -1.186 0.23739 sexM 0.511533 0.416269 1.229 0.22079 age 0.029858 0.168227 0.177 0.85933 addressU 0.735560 0.458102 1.606 0.11016 famsizeLE3 0.013457 0.379180 0.035 0.97173 PstatusT 0.188910 0.558990 0.338 0.73581 Medu -0.008148 0.254834 -0.032 0.97453 Fedu 0.006450 0.222098 0.029 0.97686 Mjobhealth 1.584729 0.895228 1.770 0.07845 . Mjobother -0.029289 0.528770 -0.055 0.95589 Mjobservices 0.545055 0.619720 0.880 0.38033 Mjobteacher -0.175333 0.787911 -0.223 0.82416 Fjobhealth -0.772045 1.062648 -0.727 0.46849 Fjobother -0.539129 0.757634 -0.712 0.47767 Fjobservices -0.479938 0.785370 -0.611 0.54193 Fjobteacher -0.079528 1.048411 -0.076 0.93962 reasonhome 0.688622 0.431180 1.597 0.11207 reasonother 0.321321 0.607912 0.529 0.59778 reasonreputation 0.435443 0.437953 0.994 0.32147 guardianmother 0.217571 0.432852 0.503 0.61585 guardianother 0.250323 0.754461 0.332 0.74045 traveltime 0.092044 0.260529 0.353 0.72429 studytime 0.411308 0.227115 1.811 0.07186 . failures -0.806048 0.250153 -3.222 0.00152 ** schoolsupyes -0.491972 0.479693 -1.026 0.30650 famsupyes -0.298508 0.366403 -0.815 0.41636 paidyes 0.214191 0.373014 0.574 0.56656 activitiesyes 0.388324 0.358694 1.083 0.28048 nurseryyes -0.407032 0.407825 -0.998 0.31964 higheryes -0.816840 0.799668 -1.021 0.30845 internetyes -0.134818 0.453433 -0.297 0.76657 romanticyes 0.101779 0.360754 0.282 0.77818 famrel -0.016745 0.185782 -0.090 0.92829 freetime 0.050437 0.184857 0.273 0.78530 goout -0.462603 0.179052 -2.584 0.01060 * Dalc 0.290830 0.240016 1.212 0.22727 Walc -0.025814 0.188709 -0.137 0.89135 health -0.056258 0.120458 -0.467 0.64106 absences -0.054226 0.020125 -2.694 0.00774 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.238 on 174 degrees of freedom Multiple R-squared: 0.2508, Adjusted R-squared: 0.08291 F-statistic: 1.494 on 39 and 174 DF, p-value: 0.04302
Let’s use the step function to find a cut down version of Model 1 that removes uneccesary predictors.
<span class="o">%%</span>R step<span class="p">(</span>saturated14<span class="p">)</span>
Start: AIC=380.6 G3 ~ (school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + Fjob + reason + guardian + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health + absences + G1 + G2) - G1 - G2 Df Sum of Sq RSS AIC - Fjob 4 5.065 876.93 373.84 - guardian 2 1.346 873.21 376.93 - reason 3 13.248 885.11 377.82 - Fedu 1 0.004 871.87 378.60 - Medu 1 0.005 871.87 378.60 - famsize 1 0.006 871.87 378.60 - famrel 1 0.041 871.91 378.61 - Walc 1 0.094 871.96 378.62 - age 1 0.158 872.02 378.64 - freetime 1 0.373 872.24 378.69 - romantic 1 0.399 872.26 378.69 - internet 1 0.443 872.31 378.71 - Pstatus 1 0.572 872.44 378.74 - traveltime 1 0.625 872.49 378.75 - health 1 1.093 872.96 378.86 - paid 1 1.652 873.52 379.00 - famsup 1 3.326 875.19 379.41 - nursery 1 4.991 876.86 379.82 - higher 1 5.228 877.09 379.88 - schoolsup 1 5.271 877.14 379.89 - activities 1 5.873 877.74 380.03 - school 1 7.043 878.91 380.32 - Dalc 1 7.357 879.22 380.40 - Mjob 4 32.450 904.32 380.42 - sex 1 7.567 879.43 380.45 <none> 871.86 380.60 - address 1 12.919 884.78 381.74 - studytime 1 16.434 888.30 382.59 - goout 1 33.447 905.31 386.65 - absences 1 36.379 908.24 387.34 - failures 1 52.025 923.89 391.00 Step: AIC=373.84 G3 ~ school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + reason + guardian + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - guardian 2 0.947 877.88 370.07 - reason 3 13.191 890.12 371.03 - famsize 1 0.002 876.93 371.84 - Medu 1 0.014 876.94 371.84 - Fedu 1 0.027 876.96 371.84 - famrel 1 0.110 877.04 371.86 - Walc 1 0.378 877.31 371.93 - age 1 0.383 877.31 371.93 - freetime 1 0.400 877.33 371.93 - romantic 1 0.484 877.41 371.95 - traveltime 1 0.554 877.48 371.97 - Pstatus 1 0.674 877.60 372.00 - internet 1 0.705 877.63 372.01 - health 1 1.025 877.95 372.09 - paid 1 1.608 878.54 372.23 - famsup 1 3.436 880.37 372.67 - schoolsup 1 4.248 881.18 372.87 - higher 1 4.945 881.87 373.04 - nursery 1 5.473 882.40 373.17 - Mjob 4 30.591 907.52 373.17 - activities 1 6.427 883.36 373.40 - sex 1 6.927 883.86 373.52 - school 1 7.118 884.05 373.57 <none> 876.93 373.84 - Dalc 1 9.398 886.33 374.12 - address 1 12.765 889.69 374.93 - studytime 1 15.487 892.42 375.58 - goout 1 31.904 908.83 379.48 - absences 1 36.836 913.77 380.64 - failures 1 54.906 931.84 384.83 Step: AIC=370.07 G3 ~ school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + reason + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - reason 3 13.466 891.34 367.33 - Medu 1 0.001 877.88 368.07 - Fedu 1 0.002 877.88 368.07 - famsize 1 0.003 877.88 368.07 - famrel 1 0.099 877.98 368.09 - Walc 1 0.382 878.26 368.16 - freetime 1 0.520 878.40 368.19 - traveltime 1 0.521 878.40 368.19 - Pstatus 1 0.541 878.42 368.20 - romantic 1 0.548 878.42 368.20 - age 1 0.660 878.54 368.23 - internet 1 0.818 878.69 368.27 - health 1 1.034 878.91 368.32 - paid 1 1.537 879.41 368.44 - famsup 1 3.107 880.98 368.82 - schoolsup 1 4.288 882.16 369.11 - higher 1 4.696 882.57 369.21 - Mjob 4 29.905 907.78 369.24 - nursery 1 5.667 883.54 369.44 - activities 1 6.202 884.08 369.57 - sex 1 6.609 884.49 369.67 - school 1 7.736 885.61 369.94 <none> 877.88 370.07 - Dalc 1 9.348 887.22 370.33 - address 1 13.101 890.98 371.24 - studytime 1 17.000 894.88 372.17 - goout 1 32.920 910.80 375.95 - absences 1 35.981 913.86 376.66 - failures 1 57.841 935.72 381.72 Step: AIC=367.33 G3 ~ school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - Fedu 1 0.001 891.34 365.33 - Medu 1 0.012 891.35 365.33 - famsize 1 0.034 891.38 365.33 - famrel 1 0.165 891.51 365.36 - Pstatus 1 0.222 891.56 365.38 - freetime 1 0.312 891.65 365.40 - romantic 1 0.765 892.11 365.51 - Walc 1 0.804 892.15 365.52 - age 1 0.852 892.19 365.53 - internet 1 0.989 892.33 365.56 - health 1 1.151 892.49 365.60 - traveltime 1 1.173 892.52 365.61 - schoolsup 1 3.406 894.75 366.14 - higher 1 3.426 894.77 366.15 - famsup 1 3.993 895.34 366.28 - paid 1 4.108 895.45 366.31 - nursery 1 5.282 896.62 366.59 - Mjob 4 31.118 922.46 366.67 - activities 1 7.662 899.00 367.16 <none> 891.34 367.33 - school 1 8.631 899.97 367.39 - sex 1 8.987 900.33 367.47 - Dalc 1 9.983 901.33 367.71 - address 1 14.591 905.93 368.80 - studytime 1 17.603 908.95 369.51 - absences 1 28.273 919.62 372.01 - goout 1 31.410 922.75 372.74 - failures 1 59.037 950.38 379.05 Step: AIC=365.33 G3 ~ school + sex + age + address + famsize + Pstatus + Medu + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - Medu 1 0.014 891.36 363.33 - famsize 1 0.034 891.38 363.33 - famrel 1 0.167 891.51 363.37 - Pstatus 1 0.224 891.57 363.38 - freetime 1 0.313 891.66 363.40 - romantic 1 0.770 892.11 363.51 - Walc 1 0.808 892.15 363.52 - age 1 0.852 892.20 363.53 - internet 1 0.994 892.34 363.56 - health 1 1.155 892.50 363.60 - traveltime 1 1.174 892.52 363.61 - higher 1 3.434 894.78 364.15 - schoolsup 1 3.479 894.82 364.16 - famsup 1 4.105 895.45 364.31 - paid 1 4.110 895.45 364.31 - nursery 1 5.289 896.63 364.59 - Mjob 4 31.118 922.46 364.67 - activities 1 7.692 899.03 365.16 <none> 891.34 365.33 - school 1 8.660 900.00 365.39 - sex 1 8.986 900.33 365.47 - Dalc 1 9.983 901.33 365.71 - address 1 14.827 906.17 366.86 - studytime 1 17.708 909.05 367.53 - absences 1 28.845 920.19 370.14 - goout 1 31.423 922.77 370.74 - failures 1 60.614 951.96 377.40 Step: AIC=363.33 G3 ~ school + sex + age + address + famsize + Pstatus + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - famsize 1 0.027 891.38 361.34 - famrel 1 0.168 891.52 361.37 - Pstatus 1 0.233 891.59 361.38 - freetime 1 0.343 891.70 361.41 - romantic 1 0.761 892.12 361.51 - Walc 1 0.797 892.15 361.52 - age 1 0.892 892.25 361.54 - internet 1 0.982 892.34 361.56 - health 1 1.142 892.50 361.60 - traveltime 1 1.170 892.53 361.61 - schoolsup 1 3.478 894.83 362.16 - higher 1 3.506 894.86 362.17 - paid 1 4.104 895.46 362.31 - famsup 1 4.164 895.52 362.33 - nursery 1 5.298 896.65 362.60 - Mjob 4 31.633 922.99 362.79 - activities 1 7.679 899.04 363.16 <none> 891.36 363.33 - school 1 8.652 900.01 363.40 - sex 1 8.973 900.33 363.47 - Dalc 1 10.176 901.53 363.76 - address 1 14.821 906.18 364.86 - studytime 1 17.697 909.05 365.54 - absences 1 29.530 920.89 368.30 - goout 1 32.214 923.57 368.93 - failures 1 61.869 953.23 375.69 Step: AIC=361.34 G3 ~ school + sex + age + address + Pstatus + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - famrel 1 0.173 891.56 359.38 - Pstatus 1 0.261 891.65 359.40 - freetime 1 0.360 891.74 359.42 - romantic 1 0.754 892.14 359.52 - Walc 1 0.792 892.18 359.53 - age 1 0.890 892.27 359.55 - internet 1 0.972 892.36 359.57 - traveltime 1 1.147 892.53 359.61 - health 1 1.159 892.54 359.61 - schoolsup 1 3.481 894.86 360.17 - higher 1 3.512 894.90 360.18 - paid 1 4.133 895.52 360.33 - famsup 1 4.145 895.53 360.33 - nursery 1 5.441 896.83 360.64 - Mjob 4 31.713 923.10 360.82 - activities 1 7.657 899.04 361.17 <none> 891.38 361.34 - school 1 8.746 900.13 361.42 - sex 1 8.948 900.33 361.47 - Dalc 1 10.150 901.53 361.76 - address 1 14.989 906.37 362.90 - studytime 1 17.738 909.12 363.55 - absences 1 29.551 920.93 366.31 - goout 1 32.349 923.73 366.96 - failures 1 61.924 953.31 373.71 Step: AIC=359.38 G3 ~ school + sex + age + address + Pstatus + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + freetime + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - Pstatus 1 0.237 891.79 357.43 - freetime 1 0.320 891.88 357.45 - Walc 1 0.721 892.28 357.55 - romantic 1 0.759 892.32 357.56 - age 1 0.783 892.34 357.56 - internet 1 0.967 892.52 357.61 - traveltime 1 1.180 892.74 357.66 - health 1 1.302 892.86 357.69 - higher 1 3.509 895.07 358.22 - schoolsup 1 3.730 895.29 358.27 - famsup 1 4.043 895.60 358.34 - paid 1 4.088 895.64 358.36 - nursery 1 5.517 897.07 358.70 - Mjob 4 32.533 924.09 359.05 - activities 1 7.833 899.39 359.25 <none> 891.56 359.38 - school 1 8.591 900.15 359.43 - sex 1 8.785 900.34 359.47 - Dalc 1 10.553 902.11 359.89 - address 1 15.168 906.72 360.99 - studytime 1 17.751 909.31 361.60 - absences 1 29.446 921.00 364.33 - goout 1 33.672 925.23 365.31 - failures 1 61.777 953.33 371.71 Step: AIC=357.43 G3 ~ school + sex + age + address + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + freetime + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - freetime 1 0.393 892.19 355.53 - romantic 1 0.734 892.53 355.61 - Walc 1 0.790 892.58 355.62 - internet 1 0.856 892.65 355.64 - age 1 0.865 892.66 355.64 - traveltime 1 1.167 892.96 355.71 - health 1 1.328 893.12 355.75 - higher 1 3.510 895.30 356.27 - schoolsup 1 3.851 895.64 356.36 - famsup 1 3.904 895.70 356.37 - paid 1 4.337 896.13 356.47 - nursery 1 5.610 897.40 356.78 - Mjob 4 32.387 924.18 357.07 <none> 891.79 357.43 - activities 1 8.471 900.26 357.46 - school 1 8.555 900.35 357.48 - sex 1 9.120 900.91 357.61 - Dalc 1 10.434 902.23 357.92 - address 1 15.119 906.91 359.03 - studytime 1 17.625 909.42 359.62 - absences 1 31.137 922.93 362.78 - goout 1 33.963 925.76 363.43 - failures 1 61.749 953.54 369.76 Step: AIC=355.53 G3 ~ school + sex + age + address + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - romantic 1 0.743 892.93 353.71 - internet 1 0.873 893.06 353.74 - age 1 0.878 893.06 353.74 - Walc 1 0.943 893.13 353.75 - traveltime 1 1.099 893.28 353.79 - health 1 1.328 893.51 353.85 - higher 1 3.576 895.76 354.38 - famsup 1 3.807 895.99 354.44 - schoolsup 1 3.955 896.14 354.47 - paid 1 4.371 896.56 354.57 - nursery 1 5.809 897.99 354.92 - Mjob 4 32.890 925.08 355.27 <none> 892.19 355.53 - school 1 8.466 900.65 355.55 - activities 1 8.650 900.84 355.59 - sex 1 9.728 901.91 355.85 - Dalc 1 11.010 903.20 356.15 - address 1 14.924 907.11 357.08 - studytime 1 17.236 909.42 357.62 - absences 1 31.582 923.77 360.97 - goout 1 34.992 927.18 361.76 - failures 1 61.373 953.56 367.76 Step: AIC=353.71 G3 ~ school + sex + age + address + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + goout + Dalc + Walc + health + absences Df Sum of Sq RSS AIC - Walc 1 0.768 893.70 351.89 - internet 1 0.822 893.75 351.90 - age 1 1.116 894.04 351.97 - traveltime 1 1.200 894.13 351.99 - health 1 1.440 894.37 352.05 - higher 1 3.931 896.86 352.65 - famsup 1 3.974 896.90 352.66 - schoolsup 1 4.136 897.06 352.69 - paid 1 4.382 897.31 352.75 - nursery 1 5.476 898.40 353.01 - school 1 8.246 901.17 353.67 <none> 892.93 353.71 - Mjob 4 34.369 927.30 353.79 - activities 1 9.014 901.94 353.86 - sex 1 9.236 902.16 353.91 - Dalc 1 10.926 903.86 354.31 - address 1 15.568 908.50 355.40 - studytime 1 17.880 910.81 355.95 - absences 1 30.856 923.78 358.98 - goout 1 35.702 928.63 360.10 - failures 1 62.751 955.68 366.24 Step: AIC=351.89 G3 ~ school + sex + age + address + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + goout + Dalc + health + absences Df Sum of Sq RSS AIC - internet 1 0.832 894.53 350.09 - traveltime 1 1.178 894.87 350.17 - age 1 1.280 894.98 350.20 - health 1 1.674 895.37 350.29 - schoolsup 1 3.747 897.44 350.78 - famsup 1 3.907 897.60 350.82 - higher 1 4.045 897.74 350.86 - paid 1 4.107 897.80 350.87 - nursery 1 5.185 898.88 351.13 - school 1 8.093 901.79 351.82 <none> 893.70 351.89 - sex 1 8.529 902.23 351.92 - Mjob 4 34.632 928.33 352.03 - activities 1 9.147 902.84 352.07 - Dalc 1 10.831 904.53 352.47 - address 1 16.038 909.73 353.70 - studytime 1 18.926 912.62 354.37 - absences 1 33.120 926.82 357.68 - goout 1 44.792 938.49 360.36 - failures 1 64.389 958.09 364.78 Step: AIC=350.09 G3 ~ school + sex + age + address + Mjob + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + goout + Dalc + health + absences Df Sum of Sq RSS AIC - traveltime 1 1.205 895.73 348.38 - age 1 1.372 895.90 348.42 - health 1 1.457 895.99 348.44 - paid 1 3.700 898.23 348.97 - famsup 1 3.862 898.39 349.01 - schoolsup 1 3.869 898.40 349.01 - higher 1 4.046 898.58 349.05 - nursery 1 5.122 899.65 349.31 - school 1 8.066 902.60 350.01 - sex 1 8.380 902.91 350.08 <none> 894.53 350.09 - Mjob 4 34.276 928.81 350.14 - activities 1 9.092 903.62 350.25 - Dalc 1 10.697 905.23 350.63 - address 1 15.359 909.89 351.73 - studytime 1 18.504 913.03 352.47 - absences 1 35.357 929.89 356.38 - goout 1 46.356 940.89 358.90 - failures 1 63.557 958.09 362.78 Step: AIC=348.38 G3 ~ school + sex + age + address + Mjob + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + goout + Dalc + health + absences Df Sum of Sq RSS AIC - age 1 1.129 896.86 346.65 - health 1 1.463 897.20 346.73 - paid 1 3.902 899.64 347.31 - famsup 1 3.993 899.73 347.33 - schoolsup 1 4.111 899.84 347.36 - higher 1 4.645 900.38 347.48 - nursery 1 4.855 900.59 347.53 - school 1 7.129 902.86 348.07 - Mjob 4 33.484 929.22 348.23 <none> 895.73 348.38 - sex 1 8.708 904.44 348.45 - activities 1 9.307 905.04 348.59 - Dalc 1 10.420 906.15 348.85 - address 1 14.186 909.92 349.74 - studytime 1 18.379 914.11 350.72 - absences 1 35.641 931.37 354.73 - goout 1 46.348 942.08 357.17 - failures 1 62.368 958.10 360.78 Step: AIC=346.65 G3 ~ school + sex + address + Mjob + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + goout + Dalc + health + absences Df Sum of Sq RSS AIC - health 1 1.494 898.36 345.00 - paid 1 3.929 900.79 345.58 - famsup 1 4.117 900.98 345.63 - nursery 1 5.106 901.97 345.86 - higher 1 5.181 902.04 345.88 - schoolsup 1 5.555 902.42 345.97 - school 1 6.043 902.91 346.08 - Mjob 4 32.895 929.76 346.35 <none> 896.86 346.65 - sex 1 8.662 905.52 346.70 - activities 1 8.875 905.74 346.75 - Dalc 1 10.389 907.25 347.11 - address 1 13.763 910.63 347.91 - studytime 1 19.266 916.13 349.19 - absences 1 34.513 931.37 352.73 - goout 1 45.430 942.29 355.22 - failures 1 61.840 958.70 358.92 Step: AIC=345 G3 ~ school + sex + address + Mjob + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + goout + Dalc + absences Df Sum of Sq RSS AIC - paid 1 3.868 902.22 343.92 - famsup 1 4.211 902.57 344.00 - nursery 1 5.102 903.46 344.21 - higher 1 5.326 903.68 344.27 - schoolsup 1 5.371 903.73 344.28 - school 1 5.855 904.21 344.39 - Mjob 4 31.786 930.14 344.44 <none> 898.36 345.00 - sex 1 8.595 906.95 345.04 - activities 1 9.235 907.59 345.19 - Dalc 1 9.792 908.15 345.32 - address 1 14.177 912.53 346.35 - studytime 1 20.456 918.81 347.82 - absences 1 33.903 932.26 350.93 - goout 1 44.888 943.24 353.44 - failures 1 62.365 960.72 357.37 Step: AIC=343.92 G3 ~ school + sex + address + Mjob + studytime + failures + schoolsup + famsup + activities + nursery + higher + goout + Dalc + absences Df Sum of Sq RSS AIC - famsup 1 2.669 904.89 342.55 - nursery 1 4.396 906.62 342.96 - higher 1 4.483 906.71 342.98 - Mjob 4 31.089 933.31 343.17 - school 1 5.625 907.85 343.25 - schoolsup 1 5.748 907.97 343.28 - sex 1 7.586 909.81 343.71 - activities 1 8.104 910.33 343.84 <none> 902.22 343.92 - Dalc 1 11.099 913.32 344.54 - address 1 13.441 915.67 345.09 - studytime 1 24.800 927.02 347.72 - absences 1 33.936 936.16 349.82 - goout 1 44.784 947.01 352.29 - failures 1 64.064 966.29 356.60 Step: AIC=342.55 G3 ~ school + sex + address + Mjob + studytime + failures + schoolsup + activities + nursery + higher + goout + Dalc + absences Df Sum of Sq RSS AIC - Mjob 4 28.859 933.75 341.27 - nursery 1 4.102 909.00 341.52 - higher 1 4.575 909.47 341.63 - school 1 4.830 909.72 341.69 - schoolsup 1 6.597 911.49 342.11 - activities 1 8.247 913.14 342.50 <none> 904.89 342.55 - sex 1 9.493 914.39 342.79 - Dalc 1 10.634 915.53 343.05 - address 1 14.000 918.89 343.84 - studytime 1 23.656 928.55 346.08 - absences 1 33.952 938.85 348.44 - goout 1 45.351 950.24 351.02 - failures 1 62.076 966.97 354.75 Step: AIC=341.27 G3 ~ school + sex + address + studytime + failures + schoolsup + activities + nursery + higher + goout + Dalc + absences Df Sum of Sq RSS AIC - higher 1 2.544 936.30 339.85 - nursery 1 3.300 937.05 340.03 - school 1 5.738 939.49 340.58 - activities 1 6.604 940.36 340.78 - schoolsup 1 7.273 941.03 340.93 - Dalc 1 7.334 941.09 340.95 <none> 933.75 341.27 - sex 1 12.155 945.91 342.04 - address 1 15.614 949.37 342.82 - studytime 1 18.323 952.08 343.43 - absences 1 32.306 966.06 346.55 - goout 1 36.907 970.66 347.57 - failures 1 57.539 991.29 352.07 Step: AIC=339.85 G3 ~ school + sex + address + studytime + failures + schoolsup + activities + nursery + goout + Dalc + absences Df Sum of Sq RSS AIC - nursery 1 3.745 940.04 338.71 - school 1 5.372 941.67 339.08 - activities 1 6.023 942.32 339.23 - Dalc 1 6.560 942.86 339.35 - schoolsup 1 7.469 943.77 339.55 <none> 936.30 339.85 - sex 1 14.398 950.69 341.12 - address 1 15.218 951.51 341.30 - studytime 1 17.742 954.04 341.87 - absences 1 30.266 966.56 344.66 - goout 1 38.949 975.25 346.58 - failures 1 55.098 991.39 350.09 Step: AIC=338.71 G3 ~ school + sex + address + studytime + failures + schoolsup + activities + goout + Dalc + absences Df Sum of Sq RSS AIC - school 1 5.025 945.07 337.85 - activities 1 6.277 946.32 338.13 - Dalc 1 6.681 946.72 338.22 - schoolsup 1 7.978 948.02 338.52 <none> 940.04 338.71 - address 1 13.750 953.79 339.82 - sex 1 15.573 955.61 340.22 - studytime 1 17.445 957.49 340.64 - absences 1 29.515 969.56 343.32 - goout 1 41.370 981.41 345.93 - failures 1 53.366 993.41 348.53 Step: AIC=337.85 G3 ~ sex + address + studytime + failures + schoolsup + activities + goout + Dalc + absences Df Sum of Sq RSS AIC - Dalc 1 5.606 950.67 337.12 - schoolsup 1 6.666 951.73 337.35 - activities 1 8.239 953.31 337.71 <none> 945.07 337.85 - sex 1 15.633 960.70 339.36 - studytime 1 20.818 965.88 340.51 - address 1 21.896 966.96 340.75 - absences 1 26.956 972.02 341.87 - goout 1 42.577 987.64 345.28 - failures 1 52.706 997.77 347.46 Step: AIC=337.12 G3 ~ sex + address + studytime + failures + schoolsup + activities + goout + absences Df Sum of Sq RSS AIC - activities 1 5.991 956.66 336.46 - schoolsup 1 7.160 957.83 336.72 <none> 950.67 337.12 - address 1 18.871 969.54 339.32 - studytime 1 20.347 971.02 339.65 - absences 1 23.952 974.62 340.44 - sex 1 24.752 975.42 340.62 - goout 1 37.787 988.46 343.46 - failures 1 49.046 999.72 345.88 Step: AIC=336.46 G3 ~ sex + address + studytime + failures + schoolsup + goout + absences Df Sum of Sq RSS AIC - schoolsup 1 5.570 962.23 335.70 <none> 956.66 336.46 - address 1 15.417 972.08 337.88 - absences 1 22.411 979.07 339.42 - studytime 1 23.997 980.66 339.76 - sex 1 29.866 986.53 341.04 - goout 1 36.975 993.64 342.57 - failures 1 51.533 1008.20 345.69 Step: AIC=335.7 G3 ~ sex + address + studytime + failures + goout + absences Df Sum of Sq RSS AIC <none> 962.23 335.70 - address 1 13.323 975.56 336.64 - absences 1 21.649 983.88 338.46 - studytime 1 24.962 987.20 339.18 - goout 1 34.506 996.74 341.24 - sex 1 35.079 997.31 341.36 - failures 1 52.412 1014.65 345.05 Call: lm(formula = G3 ~ sex + address + studytime + failures + goout + absences, data = training) Coefficients: (Intercept) sexM addressU studytime failures goout 10.25165 0.86537 0.60566 0.45018 -0.64907 -0.37654 absences -0.03561
Model 2¶
Model 2 will be equivalent to the output of the step function.
<span class="o">%%</span>R model2 <span class="o"><-</span> lm<span class="p">(</span>formula <span class="o">=</span> G3 <span class="o">~</span> sex <span class="o">+</span> address <span class="o">+</span> studytime <span class="o">+</span> failures <span class="o">+</span> goout <span class="o">+</span> absences<span class="p">,</span> data <span class="o">=</span> training<span class="p">)</span>
<span class="o">%%</span>R subs <span class="o"><-</span> regsubsets<span class="p">(</span>G3 <span class="o">~</span> sex <span class="o">+</span> address <span class="o">+</span> studytime <span class="o">+</span> failures <span class="o">+</span> goout <span class="o">+</span> absences<span class="p">,</span> data <span class="o">=</span> training<span class="p">)</span> df <span class="o"><-</span> data.frame<span class="p">(</span>est <span class="o">=</span> c<span class="p">(</span>summary<span class="p">(</span>subs<span class="p">)</span><span class="o">$</span>adjr2<span class="p">,</span> summary<span class="p">(</span>subs<span class="p">)</span><span class="o">$</span>bic<span class="p">),</span> x <span class="o">=</span> rep<span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">6</span><span class="p">,</span> <span class="m">6</span><span class="p">),</span> type <span class="o">=</span> rep<span class="p">(</span>c<span class="p">(</span><span class="s">"adjr2"</span><span class="p">,</span> <span class="s">"bic"</span><span class="p">),</span> each <span class="o">=</span> <span class="m">6</span><span class="p">))</span> qplot<span class="p">(</span>x<span class="p">,</span> est<span class="p">,</span> data <span class="o">=</span> df<span class="p">,</span> geom <span class="o">=</span> <span class="s">"line"</span><span class="p">)</span> <span class="o">+</span> theme_bw<span class="p">()</span> <span class="o">+</span> facet_grid<span class="p">(</span>type <span class="o">~</span> .<span class="p">,</span> scales <span class="o">=</span> <span class="s">"free_y"</span><span class="p">)</span>
<span class="o">%%</span>R summary<span class="p">(</span>model2<span class="p">)</span>
Call: lm(formula = G3 ~ sex + address + studytime + failures + goout + absences, data = training) Residuals: Min 1Q Median 3Q Max -5.2007 -1.3576 -0.1115 1.6244 4.4124 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.25165 0.68067 15.061 < 2e-16 *** sexM 0.86537 0.31502 2.747 0.006544 ** addressU 0.60566 0.35776 1.693 0.091972 . studytime 0.45018 0.19427 2.317 0.021464 * failures -0.64907 0.19330 -3.358 0.000935 *** goout -0.37654 0.13820 -2.725 0.006990 ** absences -0.03561 0.01650 -2.158 0.032075 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.156 on 207 degrees of freedom Multiple R-squared: 0.1732, Adjusted R-squared: 0.1492 F-statistic: 7.226 on 6 and 207 DF, p-value: 5.128e-07
Model 3¶
Model 3 will be our final model.
<span class="o">%%</span>R model3 <span class="o"><-</span> lm<span class="p">(</span>formula <span class="o">=</span> G3 <span class="o">~</span> sex <span class="o">+</span> failures<span class="p">,</span> data <span class="o">=</span> training<span class="p">)</span> summary<span class="p">(</span>model3<span class="p">)</span>
Call: lm(formula = G3 ~ sex + failures, data = training) Residuals: Min 1Q Median 3Q Max -5.3655 -1.3655 -0.0253 1.6345 3.8171 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.3655 0.2136 48.520 < 2e-16 *** sexM 0.6599 0.3088 2.137 0.0337 * failures -0.8424 0.1949 -4.323 2.37e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.235 on 211 degrees of freedom Multiple R-squared: 0.09429, Adjusted R-squared: 0.08571 F-statistic: 10.98 on 2 and 211 DF, p-value: 2.898e-05
ANOVA¶
We can now compare the 3 models we made using ANOVA.
<span class="o">%%</span>R anova<span class="p">(</span>saturated14<span class="p">,</span>model2<span class="p">,</span>model3<span class="p">)</span>
Analysis of Variance Table Model 1: G3 ~ (school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + Fjob + reason + guardian + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health + absences + G1 + G2) - G1 - G2 Model 2: G3 ~ sex + address + studytime + failures + goout + absences Model 3: G3 ~ sex + failures Res.Df RSS Df Sum of Sq F Pr(>F) 1 174 871.86 2 207 962.23 -33 -90.368 0.5465 0.978867 3 211 1054.04 -4 -91.806 4.5805 0.001532 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In this case, ANOVA isn’t very useful since the strongest predictors from the original model have been cut out. By comparing models graphically, it’s easier to get an idea of what’s going on.
By removing the strong predictors of the original model, single predictors become less important and holistic models become more accurate. Below, we see that Model 1 performs the best on the test set.
This gives insight into how we should approach these students early on. One indicator will not make or break a child, but the overall profile can still be a strong indicator.
<span class="o">%%</span>R <span class="c1">#Models</span> final1 <span class="o"><-</span> lm<span class="p">(</span>G3 <span class="o">~</span> . <span class="o">-</span>G1 <span class="o">-</span>G2<span class="p">,</span> data<span class="o">=</span>testing<span class="p">)</span> final2 <span class="o"><-</span> lm<span class="p">(</span>G3 <span class="o">~</span> sex <span class="o">+</span> address <span class="o">+</span> studytime <span class="o">+</span> failures <span class="o">+</span> goout <span class="o">+</span> absences<span class="p">,</span> data<span class="o">=</span> testing<span class="p">)</span> final3 <span class="o"><-</span> lm<span class="p">(</span>G3 <span class="o">~</span> sex <span class="o">+</span> failures<span class="p">,</span> data<span class="o">=</span>testing<span class="p">)</span> <span class="c1">#Graphs</span> plot1 <span class="o"><-</span> qplot<span class="p">(</span>G3<span class="p">,</span> predict<span class="p">(</span>final1<span class="p">),</span> data <span class="o">=</span> testing<span class="p">,</span> geom <span class="o">=</span> <span class="s">"point"</span><span class="p">,</span> position <span class="o">=</span> <span class="s">"jitter"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.8</span><span class="p">,</span> main<span class="o">=</span><span class="s">"Model 1"</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">,</span> slope<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span> theme<span class="p">(</span>legend.position<span class="o">=</span><span class="s">"none"</span><span class="p">)</span> plot2 <span class="o"><-</span> qplot<span class="p">(</span>G3<span class="p">,</span> predict<span class="p">(</span>final2<span class="p">),</span> data <span class="o">=</span> testing<span class="p">,</span> geom <span class="o">=</span> <span class="s">"point"</span><span class="p">,</span> position <span class="o">=</span> <span class="s">"jitter"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.8</span><span class="p">,</span> main<span class="o">=</span><span class="s">"Model 2"</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">,</span> slope<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span> theme<span class="p">(</span>legend.position<span class="o">=</span><span class="s">"none"</span><span class="p">)</span> plot3 <span class="o"><-</span> qplot<span class="p">(</span>G3<span class="p">,</span> predict<span class="p">(</span>final3<span class="p">),</span> data <span class="o">=</span> testing<span class="p">,</span> geom <span class="o">=</span> <span class="s">"point"</span><span class="p">,</span> position <span class="o">=</span> <span class="s">"jitter"</span><span class="p">,</span> alpha<span class="o">=</span><span class="m">.8</span><span class="p">,</span> main<span class="o">=</span><span class="s">"Model 3"</span><span class="p">)</span> <span class="o">+</span> geom_abline<span class="p">(</span>intercept<span class="o">=</span><span class="m">0</span><span class="p">,</span>slope<span class="o">=</span><span class="m">1</span><span class="p">)</span> <span class="o">+</span> theme<span class="p">(</span>legend.position<...
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.