A U.S. Army Officer turned performance tester, Scott Barber is probably best known as the co-author of Performance Testing Guidance for Web Applications by Microsoft Press. In the 11 years since Scott became a self-proclaimed “career tester of software systems” following a circuitous professional journey that began when we started college as a want-to-be Civil Engineer, Scott has written over a hundred articles and papers on testing, given nearly as many conference presentations, co-founded the Workshop on Performance and Reliability, been a keynote speaker at testing conferences on four continents, instructed thousands of students on dozens of testing topics, incorporated his company PerfTestPlus, and spent more than half of his working hours for four plus years as a Vice President and Executive Director of the Association for Software Testing. Well known for his passion and conviction, Scott can often be found sharing thought provoking ideas about testing controversies of the day.
Scott Barber will now answer your questions.
Question: How is performance testing different in different contexts, e.g. real-time, real-time embedded systems, net internal (to a company), web general, etc? – Jon Hagar Hot Sulphur Springs, Colorado
Barber: There are really 2 pieces to this question. First is, what is
performance testing and second, how does it differ by environmental context.
I think of performance testing as system testing at the intersection of testing related to speed, scalability, and stability with the goal of the testing to determine whether or not a system is fast enough, and consistent enough, with a large enough capacity for the desired volume of use and the ability to grow as volume grows (i.e. “performant enough”; and when the system is determined to be not performant enough, collaboratively assisting in both determining why and helping to achieve adequate performance through information gathering and analysis.
For real-time systems, fast enough and consistent enough equates to “effectively no part of the system waiting on any other part of the system regardless of volume”.
For embedded systems, volume is typically quite limited, as is scalability. For instance, how many people can use a single PDA at a time?
Regardless of platform, when volume is well known and defined, such as systems built for internal company usage, the scalability aspect takes a back seat.
For the “typical” web application, the real trick is balancing consistency with the other aspects. Human users prefer slightly slower, consistent performance over sometimes fast, sometimes slow, and sometimes broken performance although they are unlikely to articulate it that way.
The key is to take the time to determine which aspects of performance matter to what degree, then to design your tests to collect relevant information about the aspects of performance that matter most first. While there are certainly similarities and differences based on environmental context, that is also true about systems with the “same” environmental context, but different functions or priorities. It’s really all about value-driven test design.
Question: As testers, do you have any advice for testing and improving performance beyond easily measurable benchmarks? Where is the scope of performance testing expanding? Is it expanding? Is it shrinking? – Lanette Creamer Seattle, WA
Barber: The scope of performance testing, in theory, is no different than it ever was. In practice, however, it seems to be re-expanding. By that, I mean that before internet/web access became the “norm”, performance and performance testing was simply part of everyone’s job. In the late 1990’s, good performance was really simple (for a short period of time)… all one needed was more web servers and more bandwidth. Unfortunately, that belief lasted far longer than it was true.
Today, most organizations have realized that a more comprehensive approach to performance testing is needed and are trying to devise ways to make that happen. Also unfortunately, most organizations fail to realize that previous generations of system developers fundamentally figured out how to do this and are therefore “reinventing the wheel” so to speak.
I recently gave a joint webcast with Gomez – I am working on a paper for them – and will be presenting at the Computer Measurement Group (CMG) Conference in December on this topic. Keep your eyes on http://www.perftestplus.com in the publications and presentations section for more resources on the scope and responsibilities related to whole system performance testing. The titles of the resources will be some variant of “Performance Testing: A Life Story”.
Question: Have you ever recommended that someone skip pre-live testing and
simply go live with their system? Under these circumstances what monitoring
would you recommend they do in the live environment to assess
load/performance/stress risk? – Alan Richardson Hertfordshire, UK
Barber: Assuming you are referring to skipping load simulations, then I have absolutely recommended “skipping” pre-live performance testing. Usually because the cost/ effort of doing so was far greater than the risks associated with not doing so (which isn’t to say there aren’t risks, rather the consequences of those risks are acceptable to the business). In these cases, what I recommend is implementing risk mitigation measures.
As your question suggests, many of these risk mitigation measures include an element of production monitoring. Given no additional information, I default to recommending the following be monitored:
- CPU of all servers involved, physical or virtual, with alerts set for consecutive measurements above a certain threshold… typically between 60% and 80% utilization.
- Memory (RAM) utilization of all servers involved with alerts set for sustained utilization over a certain threshold… 75% if I have nothing else to go on.
- Internal and External Network Bandwidth utilization. Same idea. Best to start monitoring before promoting the new application to production to establish a basis for comparison.
- Usage Statistics There are a ton of possibilities here. Google Analytics or similar is simple and good enough if there are no internal collection methods available. Depending on what I know about the application, other common areas of monitoring I recommend include:
- Disk I/O Particularly if there’s a lot of file uploading and saving
- Session Management if users log in and are uniquely identified during their time on the system
There is always more that can be monitored, but those are my top seven to track until I collect enough information to make better choices.
Question: What would be your top 5 performance testing heuristics when testing a site like softwaretestpro.com? – Basim Baassiri Ottawa, Canada
Barber: Lol, that sounds like an interview question! Hey STP folks, are you reading this?
- Consistency This is my default top performance testing heuristic for systems that aren’t safety critical and don’t have severe (from the perspective of the company that is responsible for the system) legal risks. No matter what anyone may try to tell you, both human beings and computers prefer consistent performance to usually fast, sometimes sluggish, and occasionally down. If you want return visitors, get your performance stable.
- Scalability Because I happen to know that the folks at STP are actively and aggressively trying to grow an online community and increase traffic on the site.
- Interfaces/Integrations I don’t know the architecture of the system, but I’m guessing that not all of the features/functionality/ content on the site is designed, developed, configured and maintained by STP. Whenever there is a component that is out of your control, I want to test it (or at least monitor it to ensure that SLAs are being met) Personally, few things make me giggle as quickly as assessing financial penalties and/or suing 3rd parties for failing to meet their performance related SLAs.
- Reliability/Robustness All those “what happens if” cases. For instance, what happens if everyone who is attending that webcast goes directly to the site when it’s over, registers and downloads parts 1-8 in the series? In my experience, Murphy’s Law rules – along with several of it’s corollaries (i.e. … and it will go wrong at the most inopportune time and, if more than one thing can go wrong, the worst one will), so I figure, if I don’t “break” the system while I’m doing performance testing, Murphy has already won.
- Speed Yes, that’s right, speed is 5th. Even if it’s not 5th most important, what’s the point of fast if it’s inconsistent, doesn’t scale, has broken interfaces, and falls over every time someone sneezes.
Question: Back at the TISQA 2006 conference, I asked you why I don’t see you at agile conferences. You said it was because nobody talks about performance testing (and I think you also included other types of ‘extra-functional’ testing or whatever the correct term is, such as security testing) at Agile conferences. I was a bit gobsmacked as this was true at the time. I also realized my own Agile team wasn’t doing those types of tests. That was a serious hole that was sure to bite us. I went home and got with my team to start fixing that. In the years since, my team has gotten up to scratch with performance and security testing (and I am grateful to you for the wakeup call), and I see lots of sessions on these types of testing now at agile conferences. My question then is: will we see you at any Agile conferences in the future? – Lisa Crispin Denver, Colorado
Barber: Quite potentially. I think I’ve got some good, relevant and educational stories to tell. What I am finally coming to accept is that talking about this stuff at development and testing conferences is typically some odd combination of preaching to the choir and talking to a wall. To really get some energy and movement behind things like life-cycle performance testing, we need to get in the ears of project managers and above. I’ve yet to figure out how to do that (outside of one-on-one, of course), but as soon as I figure out where those people hang out, that’s where I’m going to try to be.
Question: As a wireless testing engineering, we know the testing environment in the lab and on site is usually far different. How do you know or ensure your performance testing environment in lab is close to the real world environment? If you cannot ensure your testing data is close to data on site, what do you do? – Lei Fu Ottawa, Canada
Barber: Matching production in your performance testing environment is a luxury that few teams enjoy, wireless or otherwise. The fact is that few organizations are willing to spend the money to build and maintain a production replica for performance (or any other kind) of testing, independent of whether or not it’s actually feasible (which, in your case, it probably isn’t). So what do we do? There are several options. If you test over multiple production releases, you can build a pretty accurate “translation table” from your test environment to your production environment if you monitor production carefully, then build load simulations that match a variety of time slides in production, execute those tests against the production build in the test environments and compare results. After a couple of iterations, you should start seeing patterns that you can reverse engineer into a simple translation table that achieves good enough. If that’s not quite good enough, you can enhance the concept with SPE-style, or capacity planning type models. I shy away from exclusively using these models, they tend to be wickedly accurate when they are correct, but wildly misleading with even the smallest of errors. Typically validating those models with a combination of load simulations and monitoring will alleviate those errors. If you don’t have multiple iterations, you can try to get a small window of time between deployment and live to execute some performance tests in production to at least generally validate predictions and assumptions. Absent those options, you’re simply not going to make consistently accurate predictions. The best you can do is test the components you control, stub out 3rd party components (in your case, probably the “over air” portion), substitute in the max response time allowable according to your SLA, tune the daylights out of what’s left, then find all the ways that you can make performance go bad. Once you’ve found those ways it can go bad, figure out what the early indicator of that “bad thing” is, and put an alert in production to notify folks when the boundary of any of those early indicators are crossed. It’s not predictive, but at least it enables you to do some risk mitigation planning and some warning before it all goes bad.
Question: How do you see virtualization influencing performance testing in the next few years to come? – Shmuel Gershon, Israel
Barber: I see it generating a lot of questions. I hope the questions lead to more/better performance testing than the “average” organization is currently doing (as opposed to what they probably ought to be doing, but have chosen to accept the risk of doing less… even they don’t realize that’s the decision they’ve made). I don’t think the propagation or popularization of virtualization fundamentally changes good performance testing (see cloud question above). However, as folks move to real-time, dynamic, intelligent resource allocation, that’s going to change some things. Production won’t be a static environment anymore, so QoS predictions go out the window and instead QoS measures become triggers for resource reallocation. If this sounds like a nightmare, don’t worry, it will be, but probably not the nightmare you’d expect it to be. It’s the same nightmare we’ve been facing for the last decade with Java garbage collection. It’s exactly the same principle applied to a variety of physical resources. If we, as an industry, are smart enough to remember our lessons learned, it will be a very short nightmare. If history repeats itself though, the industry as a whole either won’t make the connection, won’t remember the lessons, or won’t know how to abstract and apply those lessons. That is the nightmare I’m already having nightmares about. Ok, that’s an exaggeration. It’s more like I’m shaking my head disappointedly while watching the preview of the sequel to a movie I didn’t like very much the first time, but felt the need to watch for the pop culture references.
About the Author
Scott Barber Scott Barber is viewed by many as the world’s most prominent thought-leader in the area of software system performance testing and as a respected leader in the advancement of the understanding and practice of testing software systems in general. Scott earned his reputation by, among other things, contributing to three books (co-author, Performance Testing Guidance for Web Applications, Microsoft Press; 2007, contributing author Beautiful Testing, O’Reilly Media; 2009, contributing author How to Reduce the Cost of Testing, Taylor & Francis; TBP Summer, 2011), composing over 100 articles and papers, delivering keynote addresses on five continents, serving the testing community for four years as the Executive Director of the Association for Software Testing, and co-founding the Workshop of Performance and Reliability.
Today, Scott is applying and enhancing his thoughts on delivering world-class system performance in complex business and technical environments with a variety of clients and is actively building the foundation for his next project: driving the integration of testing commercial software systems with the core objectives of the businesses funding that testing.
When he’s not “being a geek”, as he says, Scott enjoys spending time with his partner Dawn, and his sons Nicholas and Taylor at home in central Florida and in other interesting places that his accumulated frequent flier miles enable them to explore.