Annualization

[This article was first published on R on R & Data Analysis - Eric Stemmler, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Annualization is when results that are with respect to a time period smaller then one year are scaled up so that it becomes with respect to one year. As a general principle this can also be called projection, grossing up, up-scaling, expansion or extrapolation.

In this article I want to implement and compare different methods for annualization (to stick with one term), in order to consolidate my own understanding of the topic. The method of choice is particularly easy to implement in R in a few lines of code.

Example: Annual Ridership

One usecase for annualization can appear in survey analysis: Imagine 3 riderhsip surveys are conducted in a city in 2023 to find out the total number of passengers travelling by bus on a day. The ultimate goal is to estimate the annual ridership of this city. It is known that, due to seasonality, ridership varies throughout the year – e.g., it can be lower during summer vacation time. Hence the need to survey different periods of a year. Conducting surveys is an expensive and time-consuming endeavor, so that only a small number of surveys can be conducted. In addition, another level of sampling comes into effect since usually it is impractical to surveys every bus. Therefore, a number of certain buses are chosen among all the bus lines.

In total the procedure of analysing those surveys together can be summarized as in 2 steps:

  1. For each survey, calculate an estimate of daily ridership that is representative for the period that the respective survey covers

  2. Combine results and annualize to yield an annual total ridership

Data

We will create a data set on our own. For this, we will create three random periods of a year, each of length 4 weeks for which we will assume we have estimates of daily ridership.

The tricky part is that we need those periods to be non-overlapping. We’ll simply choose three different starting dates, test if they are at least 4 weeks i.e., 120 days apart and if not try again.

set.seed(20240528)
repeat {
# substract 31 days in December to ensure surveys
# are conducted within one year
# we also want to make sure to get one survey in summer
s <- sort(ceiling(c(runif(n = 1, min = 0, max = 179),
                    runif(n = 1, min = 180, max = 180 + 2 * 31), 
                    runif(n = 1, min = (180 + 2 * 31) + 1, max = 365 - 31))))
if(all(diff(s) > 40))
  break
}

s <- as.Date("2023-01-01") + s
print(s)
## [1] "2023-03-19" "2023-07-27" "2023-10-02"

Now, let’s construct our data table (using my favourit package data.table).

library(data.table)
dt <- data.table(survey_begin = s)
dt[, survey_end := s + (4*7)]
dt[, mean_rider := fifelse(month(survey_begin) %in% c(7, 8), 1200, 2500)]
dt[, rider := rnorm(n = 3, mean = mean_rider, sd = 500)]
print(dt)
##    survey_begin survey_end mean_rider    rider
##          <Date>     <Date>      <num>    <num>
## 1:   2023-03-19 2023-04-16       2500 1788.739
## 2:   2023-07-27 2023-08-24       1200 1170.413
## 3:   2023-10-02 2023-10-30       2500 2510.341

Method 1: Effective Number of Days

Method 2: Model-based Prediction and sum-up

To leave a comment for the author, please follow the link and comment on their blog: R on R & Data Analysis - Eric Stemmler.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)