BDA3 Chapter 1 Exercise 9

[This article was first published on Brian Callander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

BDA3 Chapter 1 Exercise 9

Here’s my solution to exercise 9, chapter 1, of Gelman’s Bayesian Data Analysis (BDA), 3rd edition. There are solutions to some of the exercises on the book’s webpage.

\(\DeclareMathOperator{\dbinomial}{Binomial} \DeclareMathOperator{\dbern}{Bernoulli} \DeclareMathOperator{\dpois}{Poisson} \DeclareMathOperator{\dnorm}{Normal} \DeclareMathOperator{\dt}{t} \DeclareMathOperator{\dcauchy}{Cauchy} \DeclareMathOperator{\dexponential}{Exp} \DeclareMathOperator{\duniform}{Uniform} \DeclareMathOperator{\dgamma}{Gamma} \DeclareMathOperator{\dinvgamma}{InvGamma} \DeclareMathOperator{\invlogit}{InvLogit} \DeclareMathOperator{\logit}{Logit} \DeclareMathOperator{\ddirichlet}{Dirichlet} \DeclareMathOperator{\dbeta}{Beta}\)

Suppose there 3 doctors, who open their practice at 09:00 and stop accepting patients at 16:00. If customers arrive in exponentially distributed intervals with mean 10 minutes, and appointment duration is uniformly distributed between 5 and 10 minutes, we want to know:

  • how many patients arrive per day?
  • how many patients have to wait for their appointment?
  • how long do patients have to wait?
  • when does the last patient leave the practice?

We do this by simulation. The arrivals function will simulate the arrival times of the patients, in minutes after 09:00. In principle, we should simulate draws from the exponential distribution until the sum of all draws is above 420, the number of minutes the practice accepts patients. However, I couldn’t find any efficient way to run this in R. Instead we’ll draw so many variables such that is is highly unlikely that we have too few, then just filter out what we don’t need.

To calculate a suitably large number, note that the number of patients in one day is \(\dpois(\frac{1}{10} \cdot (16 – 9) \cdot 60)\)-distributed. The 99.99999% percentile of this distribution is qpois(0.9999999, (16 - 9) * 6) = 80. We’ll err on the safe side and use \(n=100\).

arrivals <- function(λ, t, n=100) {
  rexp(n, λ) %>% 
    tibble(
      delay = .,
      time = cumsum(delay)
    ) %>% 
    filter(time <= t) %>% 
    pull(time) %>% 
    return()
}

λ <- 1 / 10
t <- (16 - 9) * 60

arrivals(λ, t)
 [1]   4.854265   4.963889   6.651014   9.518990  17.415709  20.110178
 [7]  28.852188  34.862538  35.970215  44.205468  48.342152  52.934693
[13]  83.072579  86.746318 117.586517 122.811176 133.662687 142.016603
[19] 170.935913 190.999325 202.511439 204.915770 205.191951 208.422873
[25] 219.437526 225.162971 233.122550 235.351649 253.558658 254.097711
[31] 255.639118 256.270049 277.905899 291.055504 291.737173 294.688419
[37] 300.949679 302.681417 329.751112 335.998940 355.506712 361.162687
[43] 381.436543 388.767558 393.072689 393.088807 395.570811 401.669343
[49] 401.911643 408.650710 418.291196

Given the patients that arrive in a day, we now need a function to simulate the appointments. Let’s assume the patients get seen in the order they arrive. As we cycle through the patients, we’ll keep track of

  • n_waited, the number of patients who have had to wait for their appointment so far;
  • time_waiting, the sum of all waiting times of the patients so far; and
  • doctors, the next time at which each doctor is free to see another patient.

The doctors variable starts at c(0, 0, 0) because they are immediately availble to see patients. The doctor with the smallest availability time is the next doctor to see a patient. The start of the appointment is either the doctor’s availability time or the arrival time of the patient, whichever is greater. The end of the appointment is \(\duniform(5, 20)\)-minutes after the start of the appointment. The doctor’s availability time is then set to the end of the appointment. Once all patients have been given an appointment, the closing time is the maximum of the doctors’ next availability times or the closing time (16 - 9) * 60, whichever is greater.

process <- function(arrivals, t=0) {
  
  n_waited <- 0         # number of patients who have had to wait so far
  time_waiting <- 0     # total waiting time so far
  doctors <- c(0, 0, 0) # next time at which each doctor is free to see another patient
  
  for(i in (1:length(arrivals))) {
    wait <- pmax(min(doctors) - arrivals[i], 0) # waiting time of patient i
    time_waiting <- time_waiting + wait
    n_waited <- n_waited + (wait > 0)
    appointment_start <- max(c(min(doctors), arrivals[i]))
    appointment_end <- appointment_start + runif(1, 5, 20)
    doctors[which.min(doctors)] <- appointment_end
  }
  
  list(
    n_patients = length(arrivals),
    n_waited = n_waited,
    time_waiting = time_waiting,
    time_waiting_per_patient = time_waiting / length(arrivals),
    time_waiting_per_waiting_patient = time_waiting / n_waited,
    closing_time = pmax(max(doctors), t)
  ) %>% return()
    
}

arrivals(λ, t) %>% 
  process(t)
$n_patients
[1] 39

$n_waited
[1] 0

$time_waiting
[1] 0

$time_waiting_per_patient
[1] 0

$time_waiting_per_waiting_patient
[1] NaN

$closing_time
[1] 426.9273

To simulate the above many times, we’ll use the replicate function. For convenience, we’ll turn this into a tibble.

simulate <- function(iters, λ, t, n=100) {
  iters %>% 
    replicate(process(arrivals(λ, t, n), t)) %>% 
    t() %>% 
    as_tibble() %>% 
    mutate_all(unlist)
}

sims <- simulate(1000, λ, t)
The first few simulations of a day at the practice.
n_patients n_waited time_waiting time_waiting_per_patient time_waiting_per_waiting_patient closing_time
45 8 10.493347 0.2331855 1.311668 421.2221
44 6 13.226377 0.3005995 2.204396 435.4524
29 1 1.174172 0.0404887 1.174172 435.9763
31 1 9.413756 0.3036696 9.413756 440.1210
40 5 16.692607 0.4173152 3.338521 431.8950
37 2 11.329963 0.3062152 5.664981 420.0000

Finally, we can calculate the 50% intervals by applying the quantile function to each summary.

sims_summary <- sims %>% 
  gather(variable, value) %>% 
  group_by(variable) %>% 
  summarise(
    q25 = quantile(value, 0.25, na.rm=TRUE),
    q50 = quantile(value, 0.5, na.rm=TRUE),
    q75 = quantile(value, 0.75, na.rm=TRUE),
    simulations = n()
  )
The median and 50% interval for each summary statistic.
variable q25 q50 q75 simulations
closing_time 420.000000 424.9488049 430.7023807 1000
n_patients 38.000000 42.0000000 46.2500000 1000
n_waited 3.000000 5.0000000 9.0000000 1000
time_waiting 8.266744 19.3983242 38.9367142 1000
time_waiting_per_patient 0.212705 0.4685262 0.8622254 1000
time_waiting_per_waiting_patient 2.660497 3.8266619 5.1309744 1000

To leave a comment for the author, please follow the link and comment on their blog: Brian Callander.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.