Site icon R-bloggers

Assessing the Forecasting Ability of Our Model

[This article was first published on The Dancing Economist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today we wish to see how our model would have faired forecasting the past 20 values of GDP. Why? Well ask yourself this: How can you know where your going, if you don’t know where you’ve been? Once you understand please proceed on with the following post.

First recall the trend portion that we have already accounted for:


> t=(1:258)
> t2=t^2
> trendy= 892.656210 + -30.365580*t  + 0.335586*t2

And that the de-trended series is just that- the series minus the trend.

dt=GDP-trendy


As the following example will demonstrate- If we decide to assess the model with a forecast of the de-trended series alone we may come across some discouraging results:


> test.data<-dt[-c(239:258)]
> true.data<-dt[-c(1:238)]
> forecast.data<-predict(arima(test.data,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred

Now we want to plot the forecast data vs. the actual values of the forecasted de-trended series to get a sense of whether this is accurate or not.

> plot(true.data,forecast.data)
> plot(true.data,forecast.data,main=”True Data vs. Forecast data”)





































Clearly it appears as though there is little to no accuracy with the the forecast of our de-trended model alone.  In fact a linear regression of the forecast data on the true data makes this perfectly clear.

> reg.model<-lm(true.data~forecast.data)
> summary(reg.model)

Call:
lm(formula = true.data ~ forecast.data)

Residuals:
   Min     1Q Median     3Q    Max
-684.0 -449.0 -220.8  549.4  716.8

Coefficients:
                    Estimate    Std. Error    t value       Pr(>|t|)
(Intercept)   -2244.344   2058.828   -1.090         0.290
forecast.data     2.955      2.568         1.151         0.265

Residual standard error: 540.6 on 18 degrees of freedom
Multiple R-squared: 0.06851, Adjusted R-squared: 0.01676
F-statistic: 1.324 on 1 and 18 DF,  p-value: 0.265


> anova(reg.model)
Analysis of Variance Table

Response: true.data
                     Df  Sum Sq    Mean Sq   F value Pr(>F)
forecast.data  1     386920    386920      1.3238  0.265
Residuals     18    5260913  292273            


Now, is a good time to not be discouraged, but rather encouraged to add trend to our forecast.  When we run a linear regression of trend on GDP we quickly realize that 99.7 of the variance in GDP can be accounted for by the trend.


> reg.model2<-lm(GDP~trendy)
> summary(reg.model2)

Call:
lm(formula = GDP ~ trendy)

Residuals:
    Min      1Q  Median      3Q     Max
-625.43 -165.76  -36.73  163.04  796.33

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.001371  21.870246     0.0        1  
trendy       1.000002   0.003445   290.3   <2e-16 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 250.6 on 256 degrees of freedom
Multiple R-squared: 0.997, Adjusted R-squared: 0.997
F-statistic: 8.428e+04 on 1 and 256 DF,  p-value: < 2.2e-16


In the end we would have to had accounted for trend anyway so it just makes sense to use it when testing our models accuracy.  

> test.data1<-dt[-c(239:258)]  

# Important note is that the “-c(239:258)” includes everything except those particular 20 observations #

> true.data1<-dt[-c(1:238)]
> true.data2<-trendy[-c(1:238)]
> forecast.data1<-predict(arima(test.data1,order=c(10,0,0),include.mean=FALSE),n.ahead=20)$pred
> forecast.data2<-(true.data2)

> forecast.data3<-(forecast.data1+forecast.data2)
> true.data3<-(true.data1+true.data2)

Don’t forget to plot your data:

> plot(true.data3,forecast.data3,main=”True Values vs. Predicted Values”)



…and regress the forecasted data on the actual data:

> reg.model3<-lm(true.data3~forecast.data3)
> summary(reg.model3)

Call:
lm(formula = true.data3 ~ forecast.data3)

Residuals:
   Min     1Q Median     3Q    Max 
-443.5 -184.2   16.0  228.3  334.8 

Coefficients:
                       Estimate          Std. Error      t-value    Pr(>|t|)    
(Intercept)        8.104e+03      1.141e+03   7.102       1.28e-06 ***
forecast.data3  4.098e-01        7.657e-02   5.352        4.37e-05 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 264.8 on 18 degrees of freedom
Multiple R-squared: 0.6141, Adjusted R-squared: 0.5926 
F-statistic: 28.64 on 1 and 18 DF,  p-value: 4.366e-05 

Looking at the plot and the regression results, I feel like this model is pretty accurate considering the fact this is a point forecast and not an interval forecast.  Next time on the Dancing Economist we will plot the forecasts into the future with 95% confidence intervals. Until then-

Keep Dancin’

Steven J





To leave a comment for the author, please follow the link and comment on their blog: The Dancing Economist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.