Basic Linear Regressions for Finance

[This article was first published on R Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Linear Regression

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The relationships are modeled using linear basis functions, essentially replacing each input with a function of the input. This is linear regression:

Y=α+β1f1(X)+β2f2(X)++βnfn(X)+ϵ

This is only a subclass of linear regression:

Y=α+β1X1+β2X1++βnXn+ϵ

This is linear regression as well:

Y=α+β1X21+β2log(X1)++βnsin(Xn)+ϵ

Estimation

In R, the lm function is used to fit linear models. For panel data, the plm function from the plm package can be used (see Introduction to Econometrics with R).

Exercise Simulate an exponential growth model y(t)=y0ekt and estimate the growth rate k and the initial population y0.

# time grid
t <- seq(0, 10, by = 0.01)

# simulate y values for k = 0.33 and initial population y0 = 1000
y <- 1000*exp(0.33*t)

# add random noise
y <- y * rnorm(n = length(y), mean = 1, sd = 0.1)

# plot
plot(y ~ t, main = "Population Growth")

Assume the y values generated above are given. We don’t know the initial population y0 nor the growth rate k. To estimate these parameters we proceed as follows:

z=ln(y(t))=ln(y0ekt)=ln(y0)+kt=α+βt

where α=ln(y0) and β=k.

# transform the output variable
z <- log(y)

# fit the model
mod <- lm(z ~ t)

# extract the coefficients
mod.c <- coefficients(mod)

# extract alpha
alpha <- mod.c[1]

# extract beta
beta <- mod.c[2]

# compute y0
y0 <- exp(alpha)

# compute k
k <- beta

# print estimates
sprintf("y0 = %s; k = %s", y0, k)
## [1] "y0 = 997.365557000044; k = 0.329840311556089"

The estimates seems close to the true values y0=1000 and k=0.33, but how can we test for them to be equal? We need confidence intervals.

# computes confidence intervals for the parameters in the model
mod.i <- confint(mod, level = 0.95)
##                 2.5 %    97.5 %
## (Intercept) 6.8927301 6.9175046
## t           0.3276953 0.3319853

The true value of k=β=0.33 is inside the confidence interval obtained above and has been consistently estimated. To check for y0 we need to transform the confidence interval obtained for α.

# compute the confidence interval for y0
low <- exp(mod.i[1,1])
upp <- exp(mod.i[1,2])

# print
sprintf("Confidence interval for y0: %s - %s", round(low,1), round(upp,1))
## [1] "Confidence interval for y0: 985.1 - 1009.8"

Model Selection

In the previous example we knew the functional form linking the inputs to the output variable. This is not often the case in economics and finance, where the model is not known a priori and has to be deduced from the data.

Exercise Repeat the same exercise of the previous section but assume no model is given a priori. Deduce a reasonable model and estimate its parameters.

# visualize the data
plot(y ~ t, main = "First Look at the Data")

The data are not linear with respect to t. They seem to be some exponential, quadratic, cubic… function of t. We can try to take the log of y and see what they look like.

plot(log(y) ~ t, main = "Log Output")

Much better! This seems linear but we want to test also for quadratic and cubic effects. Build the full model:

ln(y)=α+β1t+β2t2+β3t3+ϵ

and fit it to the data.

# build a data frame of regressors
data <- data.frame(log.y = log(y), t1 = t, t2 = t^2, t3 = t^3)

# fit the model
mod <- lm(log.y ~ t1 + t2 + t3, data = data)

# summary statistics
summary(mod)
## 
## Call:
## lm(formula = log.y ~ t1 + t2 + t3, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32659 -0.06253  0.00485  0.06780  0.28404 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.908e+00  1.260e-02 548.241   <2e-16 ***
## t1           3.272e-01  1.092e-02  29.972   <2e-16 ***
## t2           6.196e-04  2.538e-03   0.244    0.807    
## t3          -3.954e-05  1.668e-04  -0.237    0.813    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1 on 997 degrees of freedom
## Multiple R-squared:  0.9891, Adjusted R-squared:  0.9891 
## F-statistic: 3.029e+04 on 3 and 997 DF,  p-value: < 2.2e-16

From the output we discover that:

  • only the intercept (α) and t1 (β1) are statistically different from zero. The probability for them to be zero is infact less than 1016.
  • t2 and t3 are not statistically different from zero. The probability of observing such estimates if their true value is zero, is infact pretty high: around 80%. We cannot reject the hypothesis of β2 and β3 to be zero and we are going to accept it.
  • the R-squared is close to 1: the model is able to capture almost all the variability in the data

Since β2 and β3 are not statistically different from zero, we reduce the full model and estimate it again.

ln(y)=α+β1t+ϵ

# fit the model
mod <- lm(log.y ~ t1, data = data)

# summary statistics
summary(mod)
## 
## Call:
## lm(formula = log.y ~ t1, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32628 -0.06206  0.00441  0.06815  0.28363 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 6.905117   0.006312  1093.9   <2e-16 ***
## t1          0.329840   0.001093   301.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09993 on 999 degrees of freedom
## Multiple R-squared:  0.9891, Adjusted R-squared:  0.9891 
## F-statistic: 9.106e+04 on 1 and 999 DF,  p-value: < 2.2e-16

To understand the meaning of the estimated coefficients, we proceed as follows:

ln(y)=α+β1ty=exp(α+β1t)=eαeβ1t=y0ekt

where:

# extract estimates
mod.c <- coef(mod)

# y0
y0 <- exp(mod.c[1])

# k
k <- mod.c[2]

# print
sprintf("y0 = %s; k = %s", y0, k)
## [1] "y0 = 997.365557000044; k = 0.329840311556089"

R-squared

R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale.

A good predictive model should achieve high values of R-squared, while this measure plays no role when assessing the significancy of the parameters.

Exercise Simulate a dataset from the model y=2sin(x)+1 and see how the R-square changes when increasing the noise in the data. Is the significance of the estimates affected?

# x grid
x <- seq(0, 2*pi, by = 0.01)

# y 
y = 2*sin(x)+1

# y: low noise
y.low <- y + rnorm(n = length(y), mean = 0, sd = 0.1)

# y: medium noise
y.mid <- y + rnorm(n = length(y), mean = 0, sd = 1)

# y: high noise
y.high <- y + rnorm(n = length(y), mean = 0, sd = 10)

# plot
layout(t(1:3))
plot(y.low  ~ x, main = "Low Noise")
plot(y.mid  ~ x, main = "Medium Noise")
plot(y.high ~ x, main = "High Noise")

# low noise
summary(lm(y.low ~ sin(x)))
## 
## Call:
## lm(formula = y.low ~ sin(x))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27887 -0.06682  0.00323  0.06331  0.33386 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.004789   0.003924   256.0   <2e-16 ***
## sin(x)      1.995093   0.005553   359.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09842 on 627 degrees of freedom
## Multiple R-squared:  0.9952, Adjusted R-squared:  0.9952 
## F-statistic: 1.291e+05 on 1 and 627 DF,  p-value: < 2.2e-16
# medium noise
summary(lm(y.mid ~ sin(x)))
## 
## Call:
## lm(formula = y.mid ~ sin(x))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0789 -0.6506 -0.0113  0.7228  2.8477 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.97263    0.03964   24.53   <2e-16 ***
## sin(x)       2.08865    0.05610   37.23   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9943 on 627 degrees of freedom
## Multiple R-squared:  0.6886, Adjusted R-squared:  0.6881 
## F-statistic:  1386 on 1 and 627 DF,  p-value: < 2.2e-16
# high noise
summary(lm(y.high ~ sin(x)))
## 
## Call:
## lm(formula = y.high ~ sin(x))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.8317  -6.3656  -0.1938   6.7277  29.8982 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.0089     0.3936   2.563   0.0106 *  
## sin(x)        2.3579     0.5570   4.233 2.65e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.873 on 627 degrees of freedom
## Multiple R-squared:  0.02779,    Adjusted R-squared:  0.02624 
## F-statistic: 17.92 on 1 and 627 DF,  p-value: 2.648e-05

The R-squared is almost 100% for y.low, 68% for y.mid and only 3% for y.high. In the first case, we are able to predict y based on x with very high accuracy. In the second case the accuracy drops. In the third case we have basically no predictive power but we were able to assess the statistically significant impact of sin(x) on y. On the other hand, the uncertainty associated with the estimates of the coefficients increased and the significancy levels drop. For even higher noise levels we won’t be able to assess the statistically significant impact of the regressor on the response variable, but this problem can be solved increasing the number of observations when possible (try as an exercise).

After running a regression analysis, we should check if the model works well for data. We paid attention to regression results, such as slope coefficients, p-values, or R-squared but that’s not the whole picture. Residuals could show how poorly a model represents data. Residuals are leftover of the outcome variable after fitting a model (predictors) to data and they could reveal unexplained patterns in the data by the fitted model. Using this information, not only could we check if linear regression assumptions are met, but we could improve our model in an exploratory way. Refer to: Understanding Diagnostic Plots for Linear Regression Analysis.

Testing CAPM

E[Rirf]=βiE[Rmktrf]

where:

  • Rit: return on asset i at time t
  • rf: risk-free return at time t
  • Rm,t: return on the market portfolio at time t

To test the model we use the following data file containing stock data from the website of Kenneth R. French. It includes the monthly simple computed stock returns in percentage points for decile portfolios formed on beta over the period 1963-2017. These are total returns (i.e. they include dividends).

# read data
data <- read.csv('https://storage.guidotti.dev/course/asset-pricing-unine-2019-2020/basic-linear-regressions-for-finance.csv')

# drop date
data <- data[,-1]

# print
head(data)
##   Lo.10 Dec.2 Dec.3 Dec.4 Dec.5 Dec.6 Dec.7 Dec.8 Dec.9 Hi.10 Mkt.RF   RF
## 1  1.35  0.77  0.08 -0.24 -0.69 -1.20 -0.49 -1.39 -1.94 -0.77  -0.39 0.27
## 2  3.52  3.89  4.29  5.25  5.23  7.55  7.57  4.91  9.04 10.47   5.07 0.25
## 3 -3.09 -2.24 -0.54 -0.97 -1.37 -0.27 -0.63 -1.00 -1.92 -3.68  -1.57 0.27
## 4  1.25 -0.12  2.00  5.12  2.32  1.78  6.63  4.78  3.10  3.01   2.53 0.29
## 5 -0.91 -0.15  1.60 -2.05 -0.94 -0.69 -1.32 -0.51 -0.20  0.52  -0.85 0.27
## 6  3.86  0.63  2.31  1.83  3.00  2.36  1.25  3.45  0.30  1.28   1.83 0.29
# get the portfolios
portfolios <- data[,-c(11,12)]

# compute excess returns
portfolios <- portfolios - data$RF 

# print
head(portfolios)
##   Lo.10 Dec.2 Dec.3 Dec.4 Dec.5 Dec.6 Dec.7 Dec.8 Dec.9 Hi.10
## 1  1.08  0.50 -0.19 -0.51 -0.96 -1.47 -0.76 -1.66 -2.21 -1.04
## 2  3.27  3.64  4.04  5.00  4.98  7.30  7.32  4.66  8.79 10.22
## 3 -3.36 -2.51 -0.81 -1.24 -1.64 -0.54 -0.90 -1.27 -2.19 -3.95
## 4  0.96 -0.41  1.71  4.83  2.03  1.49  6.34  4.49  2.81  2.72
## 5 -1.18 -0.42  1.33 -2.32 -1.21 -0.96 -1.59 -0.78 -0.47  0.25
## 6  3.57  0.34  2.02  1.54  2.71  2.07  0.96  3.16  0.01  0.99

Time-Series Approach

The time-series approach consists in the following regression:

Ri,trf=αi+βi(Rm,trf)+ϵi,t

i.e.

Yi,t=αi+βiXt+ϵi,t

where:

  • Rit: return on asset i at time t
  • rf: risk-free return at time t
  • Rm,t: return on the market portfolio at time t
  • Yi,t=Ri,trf: excess return on asset i at time t
  • Xt=Rmktrf: excess return on the market portfolio at time t

The CAPM implies αi=0. Infact, if αi0 then taking the expectation on both terms of the equation violates the CAPM.

E[Ri,trf]=E[αi+βi(Rm,trf)]=αi+βiE[Rm,trf]βiE[Rm,trf]

Therefore, the CAPM is rejected if we obsrve α statistically different from zero.

# define an empty data frame
capm <- data.frame()

# define a matrix to store residuals
eps <- matrix(NA, nrow = nrow(portfolios), ncol = ncol(portfolios))

# for each portfolio...
for(i in 1:ncol(portfolios)){
  
  # linear regression 
  mod <- lm(portfolios[,i] ~ data$Mkt.RF)
  
  # summary
  mod.s <- summary(mod)
  
  # store residuals
  eps[,i] <- residuals(mod)
  
  # extract coefficients
  alpha <- mod.s$coefficients[1,'Estimate']
  beta  <- mod.s$coefficients[2,'Estimate']
  
  # extract standard errors of the estimates
  sd.alpha <- mod.s$coefficients[1,'Std. Error']
  sd.beta  <- mod.s$coefficients[2,'Std. Error']
    
  # compute the average excess return
  excess  <- mean(portfolios[,i])
  
  # store everything into the capm dataframe
  row  <- c(excess, alpha, sd.alpha, beta, sd.beta)
  capm <- rbind(capm, row)
  
}

# assign colnames
colnames(capm) <- c('<excess>', 'alpha', 'sd.alpha', 'beta', 'sd.beta')

# print
capm
##     <excess>        alpha   sd.alpha      beta    sd.beta
## 1  0.5465291  0.219840960 0.08358282 0.6152566 0.01892420
## 2  0.5221713  0.131098193 0.07708074 0.7365138 0.01745205
## 3  0.5882875  0.145659989 0.06624880 0.8336070 0.01499956
## 4  0.6657951  0.149145083 0.06207395 0.9730148 0.01405432
## 5  0.5541590  0.013151107 0.05977863 1.0188884 0.01353463
## 6  0.6346483  0.059530018 0.06389041 1.0831290 0.01446559
## 7  0.5194801 -0.095702502 0.07022164 1.1585827 0.01589906
## 8  0.6728287 -0.005224589 0.08177222 1.2769881 0.01851426
## 9  0.6400306 -0.098993437 0.10113883 1.3918151 0.02289910
## 10 0.6306269 -0.224398053 0.13376284 1.6102814 0.03028559

We estimated αi for all the ten portfolios and their standard errors. Each αi is (approximately) normally distributed with standard deviation σαi. Therefore to test if all α are jointly equal to zero we can define the following random variable

χ2N=Ni=1(αi0σαi)2

which is the sum of N (approximately) independent standard normal variables, i.e. it has a (approximate) chi-squared distribution with N degrees of freedom.

# chi squared random variable
chi.sq <- sum((capm$alpha/capm$sd.alpha)^2)
## [1] 26.96824

Which is the probability of observing a value equal or greater than 26.9682402 if it has a chi-squared distribution with ten degrees of freedom?

pchisq(q = chi.sq, df = nrow(capm), lower.tail = FALSE)
## [1] 0.002634639

The CAPM would be rejected at a confidence level of 99%. The problem is that cov(αi,αj) will not be zero. Thus, it is common to use αcov(α)1α. Now we follow this approach to take correlation into account and compute the following statistic (GRS Test), which follows an F distributions assuming normally distributed error terms:

fGRSF(n,τnk)=τnknˆαˆΩ1ˆα1+ˆμfˆΣ1fˆμf

where:

  • T: number of time perdiods
  • n: number of assets
  • k: number of factors (in our case 1)
  • α: vector of estimated αi
  • Ω: covariance matrix of residuals
  • μ: vector giving the sample means of the factor(s)
  • Σ: covariance matrix of factors (in our case it reduces to the variance of the market excess return)
# number of time perdiods
t <- nrow(portfolios)

# number of assets 
n <- ncol(portfolios)

# number of factors (in our case 1)
k <- 1

# vector of estimated alpha_i
alpha <- capm$alpha

# covariance matrix of residuals 
omega <- cov(eps)

# vector giving the sample means of the factor
mu <- mean(data$Mkt.RF)

# covariance matrix of factors
sigma <- var(data$Mkt.RF)

# F-statistic (GRS test)
f <- (t-n-k)/n * (alpha %*% solve(omega) %*% alpha)/(1 + mu %*% solve(sigma) %*% mu)

# p-value
pf(q = f, df1 = n, df2 = t-n-1, lower.tail = FALSE)
##            [,1]
## [1,] 0.03102946

The CAPM is still rejected at a confidence level of 95%, even wen taking into account the correlations between αi.

Finally, dropping the assumption of normally distributed error terms and taking correlation into account as well, there exists a test-statistic that asymptotically approaches the χ2 distribution:

Jχ2(n)=τˆαˆΩ1ˆα1+ˆμfˆΣ1fˆμf

# chi squared statistic
x <- t * (alpha %*% solve(omega) %*% alpha)/(1 + mu %*% solve(sigma) %*% mu)

# p-value
pchisq(q = x, df = n, lower.tail = FALSE)
##            [,1]
## [1,] 0.02617805

The CAPM is still rejected at a confidence level of 95%, even when taking into account the non-normality of error terms together with correlation of αi.

We now consider a different approach to test CAPM. Note: what is done below is essentially the same of using dummy variables. Consider the model:

Ri,trf=α+Nj=1βjδi,j(Rm,trf)+ϵi,t

where δi,j is the Kronecker delta, i.e

δij={1,if i=j,0,if ij.

The model correctly reduces to the standard CAPM for each asset i. For example, consider the first asset i=1:

R1,trf=α+Nj=1βjδ1,j(Rm,trf)+ϵ1,t

Now, δ1,j equals 1 only for j=1 and vanishes for all other terms. The only term which contributes to the summation is therefore j=1 and we have the standard CAPM for the first asset, which predicts α=0:

R1,trf=α+β1(Rm,trf)+ϵ1,t

We can repeat the procedure for all assets and we obtain the standard CAPM for all assets, where now α is a common parameter, equal to 0 according to CAPM. Test for α=0 and we will test for CAPM to hold.

# number of assets
n.p <- ncol(portfolios)

# number of observations for each asset
n.t <- nrow(portfolios)

# matrix of excess returns and the n.p regressors (delta_{i,j} * (R_{m,t} - r_f))
M <- matrix(0, nrow = n.t*n.p, ncol = n.p+1)
colnames(M) <- c('excess', colnames(portfolios))

# fill the first column with the excess returns
M[,1] <- unlist(portfolios)

# fill each column with (R_{m,t} - r_f) only if i==j
for(i in 1:n.p){
  M[1:n.t + (i-1)*n.t, i+1] <- data$Mkt.RF
}

# linear regression
mod <- lm(excess ~ ., data = as.data.frame(M))

# summary
summary(mod)
## 
## Call:
## lm(formula = excess ~ ., data = as.data.frame(M))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.8914  -1.0903  -0.0251   1.0637  13.0585 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.02941    0.02622   1.122    0.262    
## Lo.10        0.62044    0.01865  33.270   <2e-16 ***
## Dec.2        0.73928    0.01865  39.643   <2e-16 ***
## Dec.3        0.83677    0.01865  44.870   <2e-16 ***
## Dec.4        0.97627    0.01865  52.351   <2e-16 ***
## Dec.5        1.01845    0.01865  54.612   <2e-16 ***
## Dec.6        1.08395    0.01865  58.125   <2e-16 ***
## Dec.7        1.15518    0.01865  61.944   <2e-16 ***
## Dec.8        1.27605    0.01865  68.426   <2e-16 ***
## Dec.9        1.38832    0.01865  74.446   <2e-16 ***
## Hi.10        1.60337    0.01865  85.978   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.105 on 6529 degrees of freedom
## Multiple R-squared:  0.8421, Adjusted R-squared:  0.8419 
## F-statistic:  3482 on 10 and 6529 DF,  p-value: < 2.2e-16

Note that all βi are the same of those estimated independently, while the intercept is not statistically significant, i.e. α is not statistically different from zero. The CAPM cannot be rejected. Note that when performing this kind of tests the reverse does not hold: we cannot say that based on this test the CAPM holds. Infact, we caould have observed α not statistically different from zero both because:

  • the true value of α is zero
  • we don’t have enough data and the uncertainty of the parameters is too high to detect the significant difference between the true α and zero. In other words, we didn’t have enough statistical power to tell the difference between zero and somthing close to zero. Increasing the size of the dataset would allow us to estimare a significant α0

What do we learn from this? First, not rejecting an hypothesis does not mean accepting it, otherwise the last apprach would contradict the previous ones. Second, for the same puropose there can be many different approaches, more or less suited to it, and several tests with different statistical power, i.e. able to distinguish better between the true value and something close to the true value.

Cross-Sectional Approach

The cross-sectional approach consists in the following regression:

E[Rirf]=βiE[Rmktrf]

i.e.

Yi=λXi+θ+ϵi

where:

  • Yi=E[Rirf]: average excess return on asset i
  • Xi=βi: coefficients estimated in the time-series approach on asset i

The CAPM implies γ=E[Rmktrf] and θ=0. Infact, if λE[Rmktrf] and/or θ0 then:

E[Rirf]=Yi=λXi=λβi+θβiE[Rmktrf]

Therefore, the CAPM is rejected if we obsrve λ statistically different from E[Rmktrf] and/or θ0.

# linear regression
sml <- lm(capm$`<excess>` ~ capm$beta)

# print
summary(sml)
## 
## Call:
## lm(formula = capm$`<excess>` ~ capm$beta)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.087237 -0.034292  0.002738  0.030721  0.078437 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.48585    0.06353   7.647 6.03e-05 ***
## capm$beta    0.10432    0.05734   1.819    0.106    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05231 on 8 degrees of freedom
## Multiple R-squared:  0.2927, Adjusted R-squared:  0.2043 
## F-statistic:  3.31 on 1 and 8 DF,  p-value: 0.1063

We estimated θ statistically different from zero and the CAPM is rejected. Regarding λ, we estimated a value of 0.1043244. Is it statistically different from E[Rmktrf]?

# mean excess return on the market portfolio
mean(data$Mkt.RF)
## [1] 0.5309786
# confidence intervals at 95%
confint(sml, level = 0.95)
##                   2.5 %    97.5 %
## (Intercept)  0.33934005 0.6323572
## capm$beta   -0.02790093 0.2365497

The mean excess return does not follow inside the confidence interval: λ is statistically different from E[Rmktrf]. The CAPM is rejected.

To conclude, we represent graphically the results obtained.

# grid of beta 
betas <- seq(0, 2, by = 0.01)

# excess returns by CAPM
E.R   <- betas * mean(data$Mkt.RF)

# plot
plot(E.R ~ betas, type = 'l', lwd = 2, col = 'orange', 
     main = "SML vs Beta Regression", xlab = 'Beta', 
     ylab = 'Mean Excess Return')

# add points estimated in the time-series approach
points(x = capm$beta, y = capm$`<excess>`,  pch = 16, cex = 1)
text(labels = 1:10, x = capm$beta, y = capm$`<excess>`, cex = 1, pos = 3)

# add regression line
abline(sml, lty = 'dashed')

To leave a comment for the author, please follow the link and comment on their blog: R Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)