Plotting PDQ Output with R
[This article was first published on Taking the Pith Out of Performance, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One the nice things about PDQ-R (coming in release 5.0) is the ability to plot PDQ output directly in R. Here’s a PDQ-R script, together with the corresponding graphical output, that I knocked up to show the effect on the throughput curve of adding more queueing delay stages (K), with everything else held constant.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
With just a single queue (K = 1) the system saturates very quickly. The throughput curve shoots up the y-axis until it hits the ceiling at X = 2.0 requests/per-unit-time. Consequently, the linear rising slope on the early part of the throughput curve is almost indistinguishable from the optimal load-line at N* = 1.016 clients. This rapid saturation effect is less pronounced in a system with more queues because there are more service stages and completion therefore takes longer. But it requires a considerable number of additional queueing centers to get a noticeable difference, e.g., K = 20, 50. Observe also that the optimal load-line moves to the right and is positioned on the x-axis at a value very close to K. I’ll let you ponder why that must be true.
The plot also explains the rationale for the approach I took in Chap. 10 of the Perl PDQ book where I modeled the scalability measurements of a multi-tier web application. In addition to the measured tiers, I ended up introducing 12 “dummy” queues in order to produce the correct round-trip latency, whilst retaining Z = 0 think time in accord with the original web application test scripts. The stunningly powerful conclusion was that there must’ve been additional latencies that were not included in the original measurements on the test rig. Otherwise, the data that were measured could not be reconciled with each other. Although I couldn’t determine what the sources of those hidden latencies were, I could state quite categorically that they were real. You cannot possibly reach this kind of penetrating conclusion without a performance model. Data comes from the Devil, models come from God.
I didn’t include the corresponding plots showing the effect of the dummy queues (similar to the above) in my Perl PDQ book because it was so tedious to write the data out to a file and then import it into Excel (which is what I was using back then). With PDQ-R, it’s a snap to do it in about 50 lines.
To leave a comment for the author, please follow the link and comment on their blog: Taking the Pith Out of Performance.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.