Site icon R-bloggers

Linear model and Transformations

[This article was first published on R-Blog on Data modelling to develop ..., and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Introduction

    The linear model still remains a reference point towards advanced modeling of some datasets as foundation for Machine Learning, Data Science and Artificial Intelligence in spite of some of her weaknesses. The major task in modeling is to compare various models before a selection is made for one or for advanced modeling. Often, some trial and error methods are used to decide which model to select. This is where this function is unique. It helps to estimate 14 different linear models and provide their coefficients in a formatted Table for quick comparison so that time and energy are saved. The interesting thing about this function is the simplicity, and it is a one line code. The differenct transformations are:

    • Linear model

    • Linear model with interactions

    • Semilog model

    • Growth model

    • Double Log model

    • Mixed-power model

    • Translog model

    • Quadratic model

    • Cubic model

    • Inverse of y model

    • Inverse of x model

    • Inverse of y & x model

    • Square root model

    • Cubic root model

    In this blog, I share with you a function Linearsystems from Dyn4cast package that can easily transform your data.frame for estimation and visualization purposes. It is a one line code and easy to use. The usage is as follows:

    Linearsystems(y, x, mod, limit, Test = NA)

    y is the vector of the dependent variable.

    x is the vector of the independent variables preferable in data.frame.

    mod is the group of linear models to be estimated. It takes value from 0 to 6. 0 = EDA (correlation, summary tables, Visuals means); 1 = Linear systems, 2 = power models, 3 = polynomial models, 4 = root models, 5 = inverse models, 6 = all the 14 models.

    limit is the number of variables to be included in the coefficients plots.

    Test is the test data to be used to predict y. If not supplied, the fitted y is used hence may be identical with the fitted value.

    With this one line of codes, in addition to the individual estimated models, the following are what you get:

    Visual means of the numeric variable

    Correlation plot

    Significant plots of all the models estimated

    Model Table

    Machine Learning Metrics which is also a list of 47 performance and diagnostic statistic

    Table of Marginal effects

    Fitted plots long format

    Fitted plots wide format

    Prediction plots long format

    Prediction plots wide format

    Naive effects plots long format

    Naive effects plots wide format

    Summary of numeric variables

    Summary of character variables

    Let us dive into an awesome experience in machine learning!

    Load library

    library(Dyn4cast)

    Estimate without test data

    y <- linearsystems$MKTcost
    x <- select(linearsystems, -MKTcost)
    Model1 <- Linearsystems(y, x, 6, 15)

    Correlation matrix

    Model1$`Correlation plot`$plot()

    Model Table

    Model1$`Model Table`
    Linear Cobb Douglas Linlog Loglin  Reciprocal in X Reciprocal in Y Double reciprocal Quadratic Square root Cubic root Cubic Mixed-power Translog Linear with interaction
    (Intercept) 1182.609 3.794** 13.073* 9.572*** −0.935 −0.291 1.496 10.051** 12.979 14.920 7.998 12.093 −914.830 178869.456
    (3733.003) (1.278) (4.964) (2.514) (2.459) (0.289) (1.129) (3.638) (12.254) (18.472) (12.105) (16.289) (1845.357) (124754.685)
    Age −26.698 −0.446 −1.729 −0.042 7.921 0.005 −3.596 −0.043 −0.035 −0.036 0.088 −0.037 261.741 −5430.850
    (39.822) (0.284) (1.103) (0.027) (5.593) (0.003) (2.568) (0.150) (0.285) (0.214) (0.811) (0.146) (516.542) (3483.580)
    Experience 15.358 −0.021 −0.074 0.000 0.205 0.000 −0.094 −0.039 0.168 0.128 −0.247 0.099 672.779 −13992.660
    (119.313) (0.244) (0.950) (0.080) (1.228) (0.009) (0.564) (0.368) (0.589) (0.433) (0.985) (0.316) (798.885) (16252.545)
    Years spent in formal education 217.766 −0.157 −0.201 0.031 2.118 0.002 −1.040 −0.674 1.287 0.964 −1.384 0.699 349.387 −15364.578
    (196.683) (0.436) (1.693) (0.132) (2.493) (0.015) (1.144) (0.631) (1.013) (0.750) (2.634) (0.541) (740.038) (11710.394)
    Household size 317.218** 0.269 1.423+ 0.150+ −0.930 −0.010 0.375 0.015 0.274 0.235 0.728 0.207 338.839 −18999.563
    (115.742) (0.196) (0.762) (0.078) (0.877) (0.009) (0.403) (0.357) (0.672) (0.504) (1.544) (0.379) (762.332) (12326.813)
    Years as a cooperative member −52.901 −0.077 −0.272 −0.025 0.239 0.004 −0.112 −0.137 0.176 0.119 −0.435 0.083 50.240 −16929.904
    (130.234) (0.247) (0.960) (0.088) (1.183) (0.010) (0.543) (0.357) (0.602) (0.448) (1.183) (0.332) (849.055) (17391.449)
    Marital statusMarried 1842.879 −0.247 −0.589 −0.555 −0.130 0.135 0.064 −0.225 −0.137 −0.137 −0.176 −0.133 0.105 −2815.028
    (1715.980) (0.277) (1.075) (1.156) (0.134) (0.133) (0.062) (1.290) (1.312) (1.314) (1.420) (1.314) (0.327) (3232.693)
    Marital statusSingle 4142.944 0.153 1.932 0.396 0.570 0.122 −0.209 0.117 −0.396 −0.851 2.167 −0.081 0.666 4127.110
    (2619.350) (0.552) (2.144) (1.764) (0.748) (0.203) (0.343) (2.101) (5.621) (8.373) (4.340) (4.742) (2.819) (6366.084)
    Marital statusWidowed 2168.175 −0.025 0.238 0.217 −0.015 0.039 0.012 0.459 0.545 0.543 0.470 0.548 0.145 −1061.522
    (1560.281) (0.262) (1.016) (1.051) (0.133) (0.121) (0.061) (1.185) (1.178) (1.175) (1.277) (1.174) (0.287) (3056.615)
    Main OccupationMarketing of Agricultural produce 696.759 −0.089 −0.279 −0.194 −0.059 0.032 0.028 −0.163 −0.163 −0.162 −0.132 −0.162 −0.029 871.619
    (988.961) (0.172) (0.667) (0.666) (0.089) (0.077) (0.041) (0.686) (0.684) (0.684) (0.711) (0.684) (0.125) (1136.352)
    Main OccupationSale of provision 974.225 0.010 0.150 0.182 −0.001 0.002 0.002 0.158 0.210 0.217 0.295 0.221 0.039 705.656
    (1227.830) (0.214) (0.832) (0.827) (0.112) (0.095) (0.052) (0.871) (0.862) (0.861) (0.946) (0.861) (0.170) (1546.785)
    Level of educationNon-formal −4535.272 −0.045 −1.255 −2.159 1.732 0.178 −0.859 2.457 13.394 20.708 4.919 12.420 1.798 −7893.322
    (3088.096) (1.182) (4.591) (2.080) (2.312) (0.239) (1.061) (4.227) (12.301) (17.922) (10.034) (11.352) (4.440) (9121.469)
    Level of educationPrimary −1891.589 0.094 −0.460 −1.164 1.748 0.102 −0.862 2.917 13.843 21.153 5.386 12.866 1.871 −4970.197
    (1948.345) (0.964) (3.744) (1.312) (2.202) (0.151) (1.011) (3.489) (11.776) (17.436) (9.624) (10.828) (4.372) (8139.493)
    Level of educationSecondary −4243.879 0.336 0.018 −0.927 1.943 −0.004 −0.958 3.913 14.780 22.087 6.378 13.791 2.017 −7558.975
    (2878.842) (1.175) (4.566) (1.939) (2.317) (0.223) (1.064) (4.360) (12.447) (18.071) (9.725) (11.481) (4.474) (9144.744)
    Level of educationTertiary −5416.211 0.325 −0.166 −1.233 1.950 0.006 −0.963 3.383 14.186 21.490 5.702 13.189 1.937 −9340.187
    (3356.079) (1.248) (4.849) (2.261) (2.345) (0.260) (1.077) (4.393) (12.329) (17.932) (9.464) (11.364) (4.432) (9561.933)
    IAge 0.000 −0.049 −0.098 −0.003 −0.048 −0.174
    (0.002) (3.637) (7.544) (0.020) (5.969) (1.124)
    IExperience 0.003 −0.975 −1.578 0.022 −0.932 0.881
    (0.015) (3.916) (6.370) (0.085) (3.676) (1.112)
    IYears spent in formal education 0.027 −8.762 −14.761 0.088 −8.700 0.409
    (0.024) (7.050) (11.744) (0.227) (6.884) (0.677)
    IHousehold size 0.006 −0.793 −1.200 −0.069 −0.644 −0.134
    (0.018) (3.992) (6.394) (0.164) (3.638) (0.571)
    IYears as a cooperative member 0.006 −1.194 −1.801 0.035 −1.017 1.074
    (0.015) (3.870) (6.291) (0.109) (3.629) (1.178)
    ICAge 0.000
    (0.000)
    ICExperience −0.001
    (0.002)
    ICYears spent in formal education −0.002
    (0.006)
    ICHousehold size 0.002
    (0.005)
    ICYears as a cooperative member −0.001
    (0.003)
    Age × Experience −187.488 413.060
    (222.702) (421.287)
    Age × Years spent in formal education −99.604 487.222
    (208.652) (333.733)
    Experience × Years spent in formal education −260.899 1263.591
    (316.027) (1423.802)
    Age × Household size −97.341 626.189+
    (213.777) (326.989)
    Experience × Household size −256.864 1524.101
    (321.486) (1450.023)
    Years spent in formal education × Household size −129.986 1885.469
    (307.538) (1237.124)
    Age × Years as a cooperative member −20.388 567.013
    (237.791) (470.536)
    Experience × Years as a cooperative member −146.606 1314.591
    (322.411) (1078.651)
    Years spent in formal education × Years as a cooperative member −19.775 1402.282
    (339.431) (1512.580)
    Household size × Years as a cooperative member −9.780 1598.233
    (349.012) (1616.021)
    Age × Experience × Years spent in formal education 72.621 −38.214
    (88.449) (38.386)
    Age × Experience × Household size 71.786 −45.862
    (89.607) (36.700)
    Age × Years spent in formal education × Household size 37.224 −59.358+
    (86.574) (35.423)
    Experience × Years spent in formal education × Household size 99.695 −141.079
    (127.942) (135.348)
    Age × Experience × Years as a cooperative member 41.773 −41.293
    (90.451) (29.463)
    Age × Years spent in formal education × Years as a cooperative member 7.558 −45.465
    (95.569) (43.009)
    Experience × Years spent in formal education × Years as a cooperative member 56.856 −106.971
    (128.179) (96.548)
    Age × Household size × Years as a cooperative member 5.684 −57.279
    (97.274) (41.653)
    Experience × Household size × Years as a cooperative member 53.271 −124.988
    (130.612) (94.861)
    Years spent in formal education × Household size × Years as a cooperative member 3.965 −148.608
    (139.964) (139.454)
    Age × Experience × Years spent in formal education × Household size −27.844 4.331
    (35.809) (3.616)
    Age × Experience × Years spent in formal education × Years as a cooperative member −16.311 3.364
    (36.096) (2.695)
    Age × Experience × Household size × Years as a cooperative member −15.558 4.083
    (36.514) (2.543)
    Age × Years spent in formal education × Household size × Years as a cooperative member −2.115 5.009
    (39.255) (3.834)
    Experience × Years spent in formal education × Household size × Years as a cooperative member −21.016 10.938
    (51.999) (8.733)
    Age × Experience × Years spent in formal education × Household size × Years as a cooperative member 6.104 −0.356
    (14.611) (0.241)
    Num.Obs. 100 100 100 100 100 100 100 100 100 100 100 100 100 100
    R2 0.269 0.134 0.139 0.144 0.136 0.144 0.141 0.166 0.168 0.168 0.173 0.168 0.248 0.471
    R2 Adj. 0.149 −0.008 −0.003 0.003 −0.006 0.003 −0.001 −0.032 −0.030 −0.030 −0.092 −0.030 −0.379 0.112
    AIC 1867.6 136.2 407.6 407.0 6.6 −25.8 −149.1 414.4 414.2 414.2 423.6 414.2 54.8 1887.3
    BIC 1909.3 177.9 449.3 448.7 48.3 15.9 −107.4 469.1 468.9 468.9 491.3 468.9 177.2 1996.7
    Log.Lik. −917.806 −52.086 −187.788 −187.515 12.708 28.878 90.564 −186.183 −186.107 −186.112 −185.782 −186.105 19.615 −901.648
    F 2.235 0.941 0.982 1.021 0.959 1.018 0.996 0.841 0.848 0.848 0.654 0.849 0.395
    RMSE 2342.84 0.41 1.58 1.58 0.21 0.18 0.10 1.56 1.56 1.56 1.55 1.56 0.20 1993.28
    + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

    Significant plot

    Individual model has one

    Model1$`Significant plot of Double Log`

    Fitted estimates

    Model1$`Fitted plots wide format`

    Marginal effects

    Model1$`Tables of marginal effects`[[1]]
    Linear Linear with interaction
    Age dY/dX −26.698 −81.589
    (39.822) (3167.398)
    Experience dY/dX 15.358 −2.022
    (119.313) (1769.754)
    Years spent in formal education dY/dX 217.766*** 263.233***
    (0.000) (45.746)
    Household size dY/dX 317.218*** 429.647***
    (0.000) (37.861)
    Years as a cooperative member dY/dX −52.901*** −32.928
    (0.000) (26.396)
    Marital status Married – Divorced 1842.879*** −2815.028***
    (0.000) (0.052)
    Marital status Single – Divorced 4142.944 4127.110***
    (0.018)
    Marital status Widowed – Divorced 2168.175 −1061.522***
    (0.123)
    Main Occupation Marketing of Agricultural produce – Civil Servant 696.759 871.619***
    (0.073)
    Main Occupation Sale of provision – Civil Servant 974.225 705.656***
    (0.043)
    Level of education Non-formal – Illiterate −4535.272 −7893.322***
    (0.082)
    Level of education Primary – Illiterate −1891.589 −4970.197***
    (0.038)
    Level of education Secondary – Illiterate −4243.879 −7558.975***
    (0.086)
    Level of education Tertiary – Illiterate −5416.211 −9340.187***
    (0.096)
    Num.Obs. 100 100
    R2 0.269 0.471
    R2 Adj. 0.149 0.112
    AIC 1867.6 1887.3
    BIC 1909.3 1996.7
    Log.Lik. −917.806 −901.648
    F 2.235
    RMSE 2342.84 1993.28
    + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

    Naive effects

    Model1$`Naive effects plots long format`

    Estimation with test data

    x <- sampling[, -1]
    y <- sampling$qOutput
    Data <- cbind(y, x)
    sampling <- sample(1:nrow(Data), 0.8 * nrow(Data)) # 80% of data is sampled for training the model
    train <- Data[sampling, ]
    Test <- Data[-sampling, ] # 20% of data is reserved for testing (predicting) the model
    y <- train$y
    x <- train[, -1]
    mod <- 4
    Model2 <- Linearsystems(y, x, 4, 15, Test)
    Model2$`Model Table`
    Linear Square root Cubic root
    (Intercept) −214.531* 7.488*** 6.250***
    (95.392) (0.480) (0.681)
    qLabor −124.406+ 1.313 0.333
    (73.063) (0.999) (0.748)
    land 27.597*** −0.044*** −0.036***
    (1.460) (0.007) (0.006)
    qVarInput 1.176*** 0.003 0.002+
    (0.078) (0.002) (0.001)
    time 20.537*** −0.007** 0.001
    (1.436) (0.002) (0.002)
    IqLabor −3.393 −1.619
    (2.382) (2.830)
    Iland 0.710*** 1.643***
    (0.076) (0.151)
    IqVarInput −0.103 −0.421
    (0.089) (0.299)
    Itime 0.132*** 0.190***
    (0.008) (0.013)
    Num.Obs. 160 160 160
    R2 1.000 1.000 1.000
    R2 Adj. 1.000 1.000 1.000
    AIC 1139.7 −1129.8 −1126.3
    BIC 1158.1 −1099.0 −1095.6
    Log.Lik. −563.830 574.900 573.170
    F 4651189.925 163649.095 160147.934
    RMSE 8.21 0.01 0.01
    + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

    Model Table

    Model2$`Table of Marginal effects`
    Linear Square root Cubic root
    qLabor −124.406+ 1.313 0.333
    (73.063) (0.999) (0.748)
    land 27.597*** −0.044*** −0.036***
    (1.460) (0.007) (0.006)
    qVarInput 1.176*** 0.003 0.002+
    (0.078) (0.002) (0.001)
    time 20.537*** −0.007** 0.001
    (1.436) (0.002) (0.002)
    IqLabor −3.393 −1.619
    (2.382) (2.831)
    Iland 0.710*** 1.643***
    (0.076) (0.151)
    IqVarInput −0.103 −0.421
    (0.089) (0.299)
    Itime 0.132*** 0.190***
    (0.008) (0.013)
    Num.Obs. 160 160 160
    R2 1.000 1.000 1.000
    R2 Adj. 1.000 1.000 1.000
    AIC 1139.7 −1129.8 −1126.3
    BIC 1158.1 −1099.0 −1095.6
    Log.Lik. −563.830 574.900 573.170
    F 4651189.925 163649.095 160147.934
    RMSE 8.21 0.01 0.01
    + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

    Visualise means of the numeric variables

    Model2$`Visual means of the numeric variable`

    Fitted estimates

    Model2$`Fitted plots long format`

    Predicted

    Model2$`Prediction plots long format`

    Significant plot

    Model2$`Significant plot of Square root`

    Performance and Diagnostic values

    Model2$`Machine Learning Metrics`
    Name Linear Square.root Cubic.root
    Absolute Error 430 0.36 0.38
    Absolute Percent Error 0.28 0.048 0.05
    Accuracy 0 0 0
    Adjusted R Square 1 1 1
    Akaike’s Information Criterion AIC 1100 -1100 -1100
    Allen’s Prediction Sum-Of-Squares (PRESS, P-Square) 0 0 0
    Area under the ROC curve (AUC) 0 0 0
    Average Precision at k 0 0 0
    Bias -9.9e-15 -7.2e-17 -2.8e-17
    Brier score 70 4e-05 5e-05
    Classification Error 1 1 1
    F1 Score 0 0 0
    fScore 0 0 0
    GINI Coefficient 1 1 1
    kappa statistic 0 0 0
    Log Loss Inf Inf Inf
    Mallow’s cp 5 9 9
    Matthews Correlation Coefficient 0 0 0
    Mean Log Loss -2e+05 -260 -260
    Mean Absolute Error 2.7 0.0023 0.0024
    Mean Absolute Percent Error 0.0018 3e-04 0.00031
    Mean Average Precision at k 0 0 0
    Mean Absolute Scaled Error 0.00085 0.0034 0.0036
    Median Absolute Error 0.74 0.00075 0.00084
    Mean Squared Error 67 4.4e-05 4.5e-05
    Mean Squared Log Error 5.1e-05 6.8e-07 7e-07
    Model turning point error 0 0 0
    Negative Predictive Value 0 0 0
    Percent Bias 6.2e-05 -8.6e-07 -8.8e-07
    Positive Predictive Value 0 0 0
    Precision 0 0 0
    R Square 1 1 1
    Relative Absolute Error 0.0011 0.0044 0.0046
    Recall NaN NaN NaN
    Root Mean Squared Error 8.2 0.0067 0.0067
    Root Mean Squared Log Error 0.0071 0.00083 0.00084
    Root Relative Squared Error 0.0029 0.011 0.011
    Relative Squared Error 8.3e-06 0.00012 0.00012
    Schwarz’s Bayesian criterion BIC 1200 -1100 -1100
    Sensitivity 0 0 0
    specificity 0 0 0
    Squared Error 11000 0.0071 0.0072
    Squared Log Error 0.0081 0.00011 0.00011
    Symmetric Mean Absolute Percentage Error 0.0018 3e-04 0.00031
    Sum of Squared Errors 11000 0.0071 0.0072
    True negative rate 0 0 0
    True positive rate 0 0 0

    Welcome to easy machine learning and models estimation!

    To leave a comment for the author, please follow the link and comment on their blog: R-Blog on Data modelling to develop ....

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Exit mobile version