[This article was first published on Taking the Pith Out of Performance, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of my gripes about some commercial load testing tools is that they only provide a think time distribution (Z) that is equivalent to uniform variates in the client-script. If you want some other distribution, you have to code it and debug it yourself. Load test generators are essentially very expensive workload simulators; especially when you take into account the cost of the SUT platform. At those prices, a selection of distributions should be provided as a standard library—like they are in event-based simulators.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
To make this point a bit clearer, I used the very convenient variate-generation functions in R to compare some of the distributions that I consider should be included in such a library for the convenience of workload-test designers and performance engineers. The statistical mean (i.e., the average think delay) is the same in all these plots and is shown as the red vertical line, but pay particular attention to the spread around the mean on the x-axis.
Uniform: The first plot (upper left) shows the default uniform distribution with a mean Z = 10 seconds and a range between 5 and 15 seconds. This is what a standard random number generator produces. Each call in the script will produce an explicit think delay somewhere around 10 seconds. The typical frequency of occurrence for each variate is shown in the y-axis. I’m using seconds here as the nominal time base for think delay.
Exponential: One of the most common alternative delay distributions is the exponential distribution. There are two reasons you might want to use this distribution:
- It increases the likelihood of queueing and therefore detecting buffer overflows in the SUT
- It makes test results easily comparable to a PDQ model, which always assumes an exponential Z distribution
Gamma: Often there is considerable gnashing of teeth over the exponential distribution not being realistic. Quite apart from the usual academic technicalities, it’s a better choice than uniform. That said, a suitable generalization, which introduces more correlations into the arrivals, is the gamma distribution. Whereas the exponential distribution is defined in terms of a single arrival-rate parameter (λ), the gamma distribution is defined by two parameters: the shape (α) and scale (β). Setting α = 1 and β = λ, produces the exponential distribution.
Pareto: Finally, the Pareto distribution is suitable for simulating highly correlated arrivals, such as has been discussed ad nauseum in the context of Internet packets. The Pareto distribution emulates heavy-tailed or self-similar traffic. Since Perato is a hyperbolic-class function, it corresponds to infinite variance effects or almost constant variance over many decades, in practice.
I’ll have more to say about all this in our upcoming Guerrilla Data Analysis Techniques class in August.
To leave a comment for the author, please follow the link and comment on their blog: Taking the Pith Out of Performance.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.