Site icon R-bloggers

How to Generate Exponential Delays

[This article was first published on The Pith of Performance, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This question arose while addressing Comments on a previous blog post about exponentially distributed delays. One of my ongoing complaints is that many, if not most, popular load-test generation tools do not provide exponential variates as part of a library of time delays or think-time distributions. Not only is this situation bizarre, given that all load tests are actually performance models (and who doesn’t love an exponential distribution in their performance models?), but without the exponential distribution you are less likely to observe such things as buffer overflow conditons due to larger than normal (or uniform) queueing fluctuations. Exponential delays are both simple and useful for that purpose, but we are often left to roll our own code and then debug it.

Listing 2.2 on p. 35 of my Perl::PDQ book shows you how to generate exponential variates in Perl. Here, however, I want to use R to compare exponential delays with both the uniform distribution (the default distribution available in all load-test simulators) and the normal distribution (the familiar “bell curve“).

The R function that generates exponential variates directly is rexp(n, rate = 1) where, for example, the parameter called rate might correspond to the arrival rate of requests going into your test rig or system under test (SUT). The R programming language uses the same notation as p. 57 of my Perl::PDQ book. In my books and classes, I usually write that rate as $\lambda$ to match conventional queueing theory symbology. The probability density function (PDF), or dexp() in R, is usually written as:

\begin{equation} f(t) = \lambda e^{-\lambda t} \end{equation}

and shown in Figure 1.

Figure 1. Negative exponential probability density function (PDF) in eqn.(1) with mean = 30
The rate is $\lambda$, but the average or statistical mean of (1) is given by the inverse rate or $1/\lambda$. Since $\lambda$ is the average arrival rate, $1/\lambda$ is the average interarrival time as would be seen by the SUT. The view from the load-test client corresponds to a think-time delay of $Z = 1/\lambda$ in your script. In either case, the delay is the time interval between requests, whether departing the client or arriving at the SUT. If you could apply the R function rexp() directly to produce 10 exponentially distributed delays with a mean time of $Z=30$ seconds, you would write rexp(10,1/30) with the result:

> rexp(10,1/30)
 [1] 126.841780  78.137417  12.989666  31.544992   9.598437  89.299508   2.063628   4.597275
 [9]  24.792890  11.950121

Note that some delays are much smaller than the mean while other delays are much greater. Unfortunately, this R function is not available to you in load-test scripts so, you have to code your own. Here’s how that works.

Inverse Transformation
In eqn. (1), we have the output $f(t)$ on the left and the corresponding delay $(t)$ on the right side (in the exponent). However, we would really prefer to have things the other way around: flip a coin to get an input on the right and find out what delay that corresponds to as an output on the left.

We can use the inverse transform to do precisely that. The inverse function does not necessarily exist for an arbitrary probability distribution but, thankfully, the exponential distribution has a very simple form which allows it. The inverse of the exponential function is the natural logarithm function. The PDF in (1) lies in the range $0 \le f < \lambda$ on the $y$-axis, but we need to work with probabilities. The function which does this is the cumulative distribution function $F(t)$ in Figure 2:

\begin{equation} F(t) = 1 – e^{-\lambda t} \end{equation}

which is strictly bounded by the range $0 \le F < 1$. $F(t)$ is the corresponding area under $f(t)$ and corresponds to pexp(q, rate = 1) in R.

Figure 2. Negative exponential cumulative distribution function (CDF) in eqn.(2) with mean = 30
Typically, we would look along the $t$-axis (horizontal) for a particular time $(t)$ and then look up (to the curve) and across to the y-axis $(F)$ to find out the probability of that time occurring. Here, instead, we pick a random point on y-axis interval corresponding to $F$ (e.g., by flipping a coin). The corresponding delay is read off from the t-axis by following the dashed arrow in Figure 2, which shows this inversion process for probability values $0.90$, $0.80$ and $0.30$. BTW, those probability values also correspond respectively to $90$th, $80$th, and $30$th percentiles, if you prefer to think of them that way. We can simulate the coin flip by using a variate $u \sim U(0,1)$ chosen from a uniform distribution $0 \le u < 1$.

Based on Figure 2, how can we calculate the corresponding interarrival delay $(t)$ that the load generator should use? Letting $u$ represent $F$ in (2) and transposing produces:

\begin{equation} e^{-\lambda t} = 1 – u \end{equation}

Next, we solve (3) for $t$ by taking natural logs of both sides—the inverse function:

\begin{equation} \lambda t = – \ln(1 – u) \end{equation}

But the value of $u$ lies in the same interval as $(1-u)$, since they have the same uniform distribution. Hence, we can use the slightly simpler form:

\begin{equation} t = – \frac{\ln(u)}{\lambda} \end{equation}

Finally, we have arrived at the place where we wanted to be: flip a coin to get a random input on the right hand side of (5) and find out what delay the client script should use as an output on the left. For load testing, the random delay $(t)$ is associated with a mean think time $Z = 1/\lambda$ and is therefore computed using:

\begin{equation} t = -Z \ln(u) \end{equation}

Equation (6) is what rexp() uses under the covers, and it’s what you need to code in your client test scripts. A rather simple formula which, again, underscores the lunacy of not having it integrated into the load-test simulator.

Examples in R
Using R, we first generate $10$ random variates (coin tosses) from a uniform distribution:

> uVariates <- runif(10) # create $10$ uniform variates
> uVariates
 [1] 0.13737362 0.23143301 0.59315715 0.56576222 0.48420383 0.66065404 0.77418521 0.07673259
 [9] 0.39985820 0.99247847
Next, we use these values as inputs into eqn. (6), with a mean time of 30 seconds, to calculate 10 exponentially distributed delays:

> thinkZ  <- 30 # specify the average desired delay
> tDelays <- -thinkZ*log(uVariates) # see eqn. (6)
> tDelays
 [1] 59.5515274 43.9039444 15.6688773 17.0874419 21.7574800 12.4357489  7.6783242 77.0228628
 [9] 27.4993587  0.2264988

Note the spread of delay times, which would also create significant fluctuations in queue depth as seen by buffers on the SUT side.

For comparison, here are $10$ delay samples produced by a uniform distribution with the same mean as used for the exponential samples, i.e., the arithmetic mean $\frac{0+60}{2}=30$ seconds:

> runif(10,0,60)
 [1] 25.90247 51.94069 30.43872 10.11337 29.17908 25.90095 31.91967 42.46816 49.94629 34.62311

Similarly, here are $10$ delay samples produced by a normal distribution with a mean of $30$ seconds:

> rnorm(10,30)
 [1] 30.57623 30.63393 30.45814 29.40551 29.59376 30.38025 29.91399 30.12652 29.67296 29.00941
Clearly, the exponential distribution produces a greater spread of delay times.

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: The Pith of Performance.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.