In my last article, I wrote about the paradigm shift in web application architecture and why performance testers have to re-think their strategy around testing rich internet applications (RIA) for performance. Web application development processes and user expectations continue to grow by leaps and bounds. Sadly, the techniques and approaches employed to test those applications have not kept up with the same growth rate. But the good news is that newer tools are coming up and methodologies are being defined to close in on that gap. Hence, it is essential that performance testers make use of them at every phase of the performance testing lifecycle.
Early on in the performance testing lifecycle, some of the primary tester tasks include gathering requirements and collecting application usage statistics. In this article, I will explain how web analytics tools can be a great source of information to gather historical data about the application usage and user behavior.
Traditional Web Server Log Approach
Traditionally, performance testers have been relying on the web server log files to collect historical application usage data. A web server log was and still is a great source of information. They contain enormous data on web usage activity and server errors. Downloading log files from the web server and running report generation tools will help testers get meaningful info out of them. However, web server logs have their limitations. For example,
- Usage data contained in the web server logs do not include most “page re-visits” due to browser caching. For e.g. If a user re-visits a page, no request is received by the web server as the page is retrieved from browser cache.
- While data contained in the web server logs can provide insights into system behavior, it does not help much in understanding user/human behavior.
- Web server logs do not provide user’s geographical info, the browser they used and the device/platform they accessed the application from. All of which are vital metrics to understand user behavior on the application.
While web server log files are still a great way to measure user statistics, new ways to measure web traffic have developed that provide information from a user-perspective rather than a system perspective. A large number of organizations are implementing what is called web analytics tools as part of their web application infrastructure. For example: Industry reports suggest that Google analytics, a leading web analytics tool provider is used on 57% of the top 10,000 websites.
Web Analytics and insight into User Behavior
“Website visitors are now people, not clicks”, says a leading web analytics provider. In short, that is what web analytics tools bring to the table. They track and analyze what real users do when they are on a web application. These tools keep track of the pages that users landed on, where they navigate within the site and where they exit from. Web analytics tools use a technique called “page tagging” that uses a combination of JavaScript from within the browser and cookies to capture these metrics. As the metrics are captured from real users’ browser, they are more accurate and insightful. Web analytics tools are primarily used by companies for search engine optimization (SEO) and measuring advertising/marketing initiatives. However, these tools can also provide valuable metrics for performance testers in designing load test scenarios that realistically mimic user behavior.
Realistic Estimate of User Idle Time
Anyone who has been doing performance testing for a while knows that “user idle time” is an important metric to factor into a realistic load test scenario. It determines the speed at which a web server is being hit with load. But, it is often represented inaccurately in a test. Some use the user idle time that is being recorded, some use the value provided by business owners, and there are others who just make guesstimates. Web analytics tools provide two key metrics, known as “Average Time Spent on the Site” and “Average Time Spent on the Page”. These metrics can be useful for performance testers in accurately determining the “user idle time” between business transactions and eliminate guesswork out of the equation.
Realistic Emulation of Browser Caching Behavior
Industry reports suggest that ‘Cached pages’ can account for up to one-third of all page views. Due to its obvious performance benefits, browser caching mechanisms are extensively used by application developers. Web server logs do not (and will not be able to) capture user activity metrics for cached pages as no request is made to the web server. On the contrary, web analytics tools track visits to cached pages (as they track usage from users’ browser) and thus provide a more accurate picture of “browser caching” on the web application. Performance testers can use this information to determine what percentage of total application usage is being cached and emulate this browser behavior in the load test scenario.
Emulation of User Behavior on 3rd party Web elements like AJAX & Flash
How many of you are involved in load testing 3rd party web components like Flash, Silverlight & AJAX and feel that you don’t have any historical usage data to work with? You’re not alone and there is a reason for that. Web server logs are not really great at tracking usage data for these 3rd party web elements. However, web analytics tools do a great job of tracking and reporting user activity on Flash-driven elements, embedded AJAX page elements, page gadgets, and file downloads and so on. So the next time you are involved in testing one of these 3rd party elements, you have a savior.
WAN Emulation by factoring in User Geo-Location
Lately, product and business owners of rich internet applications are requesting performance testers to identify geographical locations where page load times are higher than a specific threshold. As you know already, user traffic for web applications can virtually come from anywhere in the globe. In order to build a load test scenario that satisfies this objective, it is imperative for performance testers to gather historical usage data that provides a breakdown of customers by geographical region they are accessing the application from. Web analytics tools help us to pinpoint the geographical location of users using a technique called Geo-Location. This is done by way of tracking the IP address of users through cookies to determine where they are located. Performance testers can make use of this valuable data in conjunction with WAN Emulation (or) Cloud-based tools to design a load test scenario where user load is generated from across those geographic locations. By doing that, performance testers can factor in the network conditions from those geographical regions and emulate them accordingly.
Conclusion
In short, web analytics tools do not replace web server logs as a source for usage activity measurement and reporting. They rather complement each other. Web server logs along with key “user behavior” metrics from web analytics tools can provide a complete picture of usage activity for your web application under test. It is time that performance testers make use of this valuable tool in modeling application usage for their load tests.
Exercise: “Measuring the performance impact of a website marketing campaign” would be a great case study/exercise for performance testers to use web analytics tools to build a realistic load test scenario.
About the Author
Suraj Sundarrajan is passionate about making web applications run faster. As a Senior Performance Test Engineer at TransUnion, he helps optimize the performance of their Web & Mobile applications. He has been in the field of performance engineering for more than 8 years with extensive experience in designing, building and executing performance, scalability and capacity tests for high-volume and business-critical enterprise applications. He currently focuses on developments in the Web performance and Cloud computing areas. Follow him on twitter @perfengineering and LinkedIn http://www.linkedin.com/in/ssuraj