Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In loss forecasting, it is often necessary to disaggregate annual losses into each quarter. The most simple method to convert low frequency to high frequency time series is interpolation, such as the one implemented in EXPAND procedure of SAS/ETS. In the example below, there is a series of annual loss projections from 2013 through 2016. An interpolation by the natural spline is used to convert the annual losses into quarterly ones.
SAS Code:
data annual; input loss year mmddyy8.; format year mmddyy8.; datalines; 19270175 12/31/13 18043897 12/31/14 17111193 12/31/15 17011107 12/31/16 ; run; proc expand data = annual out = quarterly from = year to = quarter; id year; convert loss / observed = total method = spline(natural); run; proc sql; select year(year) as year, sum(case when qtr(year) = 1 then loss else 0 end) as qtr1, sum(case when qtr(year) = 2 then loss else 0 end) as qtr2, sum(case when qtr(year) = 3 then loss else 0 end) as qtr3, sum(case when qtr(year) = 4 then loss else 0 end) as qtr4, sum(loss) as total from quarterly group by calculated year; quit;
Output:
year qtr1 qtr2 qtr3 qtr4 total 2013 4868536 4844486 4818223 4738931 19270175 2014 4560049 4535549 4510106 4438194 18043897 2015 4279674 4276480 4287373 4267666 17111193 2016 4215505 4220260 4279095 4296247 17011107
While the mathematical interpolation is easy to implement, it might be difficult to justify and interpret from the business standpoint. In reality, there might be an assumption that the loss trend would follow the movement of macro-economy. Therefore, it might be advantageous to disaggregate annual losses into quarterly ones with the inclusion of one or more economic indicators. This approach can be implemented in tempdisagg package of R language. Below is a demo with the same loss data used above. However, disaggregation of annual losses is accomplished based upon a macro-economic indicator.
R Code:
library(tempdisagg) loss <- c(19270175, 18043897, 17111193, 17011107) loss.a <- ts(loss, frequency = 1, start = 2013) econ <- c(7.74, 7.67, 7.62, 7.48, 7.32, 7.11, 6.88, 6.63, 6.41, 6.26, 6.12, 6.01, 5.93, 5.83, 5.72, 5.59) econ.q <- ts(econ, frequency = 4, start = 2013) summary(mdl <- td(loss.a ~ econ.q)) print(predict(mdl))
Output:
Call: td(formula = loss.a ~ econ.q) Residuals: Time Series: Start = 2013 End = 2016 Frequency = 1 [1] 199753 -234384 -199257 233888 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2416610 359064 6.730 0.0214 * econ.q 308226 53724 5.737 0.0291 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 'chow-lin-maxlog' disaggregation with 'sum' conversion 4 low-freq. obs. converted to 16 high-freq. obs. Adjusted R-squared: 0.9141 AR1-Parameter: 0 (truncated) Qtr1 Qtr2 Qtr3 Qtr4 2013 4852219 4830643 4815232 4772080 2014 4614230 4549503 4478611 4401554 2015 4342526 4296292 4253140 4219235 2016 4302864 4272041 4238136 4198067
In practice, if a simple and flexible solution is desired without the need of interpretation, then the mathematical interpolation might be a good choice. On the other hand, if there is a strong belief that the macro-economy might drive the loss trend, then the regression-based method implemented in tempdisagg package might be preferred. However, in our example, both methods generate extremely similar results.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.