Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Annualization is when results that are with respect to a time period smaller then one year are scaled up so that it becomes with respect to one year. As a general principle this can also be called projection, grossing up, up-scaling, expansion or extrapolation.
In this article I want to implement and compare different methods for annualization (to stick with one term), in order to consolidate my own understanding of the topic. The method of choice is particularly easy to implement in R in a few lines of code.
Example: Annual Ridership
One usecase for annualization can appear in survey analysis: Imagine 3 riderhsip surveys are conducted in a city in 2023 to find out the total number of passengers travelling by bus on a day. The ultimate goal is to estimate the annual ridership of this city. It is known that, due to seasonality, ridership varies throughout the year – e.g., it can be lower during summer vacation time. Hence the need to survey different periods of a year. Conducting surveys is an expensive and time-consuming endeavor, so that only a small number of surveys can be conducted. In addition, another level of sampling comes into effect since usually it is impractical to surveys every bus. Therefore, a number of certain buses are chosen among all the bus lines.
In total the procedure of analysing those surveys together can be summarized as in 2 steps:
For each survey, calculate an estimate of daily ridership that is representative for the period that the respective survey covers
Combine results and annualize to yield an annual total ridership
Data
We will create a data set on our own. For this, we will create three random periods of a year, each of length 4 weeks for which we will assume we have estimates of daily ridership.
The tricky part is that we need those periods to be non-overlapping. We’ll simply choose three different starting dates, test if they are at least 4 weeks i.e., 120 days apart and if not try again.
set.seed(20240528) repeat { # substract 31 days in December to ensure surveys # are conducted within one year # we also want to make sure to get one survey in summer s <- sort(ceiling(c(runif(n = 1, min = 0, max = 179), runif(n = 1, min = 180, max = 180 + 2 * 31), runif(n = 1, min = (180 + 2 * 31) + 1, max = 365 - 31)))) if(all(diff(s) > 40)) break } s <- as.Date("2023-01-01") + s print(s) ## [1] "2023-03-19" "2023-07-27" "2023-10-02"
Now, let’s construct our data table (using my favourit package data.table
).
library(data.table) dt <- data.table(survey_begin = s) dt[, survey_end := s + (4*7)] dt[, mean_rider := fifelse(month(survey_begin) %in% c(7, 8), 1200, 2500)] dt[, rider := rnorm(n = 3, mean = mean_rider, sd = 500)] print(dt) ## survey_begin survey_end mean_rider rider ## <Date> <Date> <num> <num> ## 1: 2023-03-19 2023-04-16 2500 1788.739 ## 2: 2023-07-27 2023-08-24 1200 1170.413 ## 3: 2023-10-02 2023-10-30 2500 2510.341
Method 1: Effective Number of Days
Method 2: Model-based Prediction and sum-up
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.