VoiceXML Review - Feature Articles

Volume 3, Issue 3 - May/June 2003

A Case for Improved Dialog Traversal (Call Flow) Testing and Analysis

By Stuart Harding

Synopsis: There needs to be an improved methodology for evaluating the functionality and performance of an application's dialog traversal over the current process of manual calling and reporting. This article presents a case for comprehensive, consistent, repeatable, automated testing developed by CoAssure, Inc. When used with VXML (and other coding technologies) for speech and IVR telephony applications, improved and accelerated test results, presented in concise online reports is achieved. Cost savings can be an added benefit for companies routinely using the service.

Testing is an integral part of the design, development, deployment and support for self-service (voice recognition and IVR) applications. Such a high percentage of voice telephony applications are customer facing, that virtually all can be considered to be mission critical and should, therefore, be closely scrutinized at several points in the development cycle. Prior to deployment and any time hardware or software upgrades are implemented, a comprehensive analysis of functionality and performance is a cost effective procedure. After all, the purpose of self-service applications is to generate costs savings by reducing the load on call center personnel, receptionists, etc. and to provide the caller with easy access to information on a 24/7 basis. Without convenient access to information, callers opt for live agents, undermining the advantages and cost savings expected from the system. Even minor issues with the application result in an unsuccessful self-service call and/or customer dissatisfaction. Comprehensive and consistent testing is the only way to uncover issues with call flow.

Many companies are comfortable with using outside vendors for usability testing on VUI design; load testing to validate system performance prior to deployment is commonly outsourced; but today, nearly all companies that are developing/deploying self-service applications use in-house resources and a cumbersome manual process for validating the functionality and performance of the dialog traversal. There are numerous reasons for using in-house resources, some are good reasons, but others should be examined more closely.

How is traversal testing conducted today and exactly what is tested and measured when conducting dialog traversal tests? Every company surveyed that is developing self-service telephony applications claims to conduct a test of the call flow functionality. Most of these same companies check prompts, grammars, error conditions, etc for compliance with the specification. A few companies have developed rudimentary tools to ensure that every call path can be exercised. Some companies pass responsibility for testing the application to the company deploying the application. Every company employs a methodology for testing that relies on a person or a group of people to phone the application and step through call flow paths. The test caller must dial the number, follow the call flow procedure, listen carefully to the responses, and note variances from the specification. This process may go on for hours, usually days, sometimes weeks. When executed well, it is, at best, tedious work, prone to problems - an inexact exercise.

Why is manual testing (of the dialog traversal) an imprecise exercise? The reasons are numerous; anyone who has directed traversal testing will likely have a far more extensive list than the one provided below.

The selection of personnel to test applications often results in a varied group of testers, either from day-to-day or week-to-week. Although the individuals may be qualified to conduct tests, a primary goal of testing should be to eliminate variables and only evaluate the effects of changes made to the application code.
The call flow paths are not always covered rigorously. To be certain the application is performing in accordance with the specification, the tests must exercise every state and prompt. Where feasible all in-grammar responses should be tested. Additionally, testing all of the exception conditions (silence, out-of-grammar, and inappropriate utterances) to the limits of the spec should confirm proper error handling throughout the application. Global commands need to be checked in every state.
The beginning of speech delays and barge-in tests cannot be executed precisely by a human. Applications perform differently when delays and barge-ins vary; to precisely evaluate the differences a means of accurate measurement is required.
Interpretations of the specification may vary from person to person. Maintaining a consistent determination for problem areas is most difficult, especially if the testing phase is extended across several days or even weeks.
Errors detected and reported may not be repeatable. This is a major problem for any test system. The best solution is to have a methodology for recording calls and playing back only those portions of a call where discrepancies occur.
The reporting system may be slow or imprecise. Collecting, consolidating and distributing test results can be a cumbersome task, especially when multiple locations are involved.

The best time to use in-house resources to check functionality is early in the application development process when there are many issues; using developers to debug new code is probably most efficient and a good learning exercise. As an application grows or more modules are added, the test process becomes a more time consuming endeavor. Using developers to place calls is an expensive way to test, a more objective and cost effective approach is needed.

There needs to be an improved way to test self-service applications. There should be a way to comprehensively and consistently test the traversal. There needs to be an objective evaluation and timely report of the progress made by code changes to the application.

CoAssure has developed an automated testing methodology that represents a vast improvement over manual calling. First, CoAssure reviews the application specification provided by the customer. Next an XML representation of the application is created for the purpose of insuring comprehensive test coverage. The XML code also allows for different testing criteria to be used based on the desired goal of the test. For example, early in the process a test set of calls can be created which does not exercise all of the error handling conditions. Such a test set can be executed in a minimum amount of time and quickly reported. Known deficiencies can be eliminated from scrutiny, (e.g., if not all global commands have been instituted across the application, they can be excluded from testing). Prior to code release, a comprehensive test set can thoroughly evaluate the application - both the basic functionality and the performance of the overall system, with delays accurately measured. At a later date, traversal testing can be conducted, using the same test set, during load testing to determine system performance under those conditions and compare the application in the unloaded situation.

The XML code will determine all of the in-grammar utterances that must be pre-recorded for the purpose of the automated calling. With the traversal test set and prerecorded utterances, the automated execution of the test calls is ready to begin. Automated calling progresses much faster than manual calls. The system dials the application phone number and steps through the prescribed test call in the shortest time possible. Where feasible, the system will enable barge-in to minimize test calling times. Immediately upon completion of one call, the next call is initiated; hundreds of calls can be completed in a day using a single port.

As the test progresses, discrepancies are noted, assigned a code number and catalogued (stored) in a database. Every call is recorded and can later be played in its entirety, or the user has the option of only listening to the discrepant portions of a call.

In addition to addressing the issues of comprehensive, consistent and repeatable testing, efficient reporting of the test results has been addressed. Soon after a test set run has been completed the results will be available in a password protected location on the Internet. Many reports are available and have been designed to meet the needs of developers, QA personnel and program managers. Developers can go directly to the traversal discrepancies that have been noted, where they can see the expected text and listen to the recordings. Higher level summary reports give a quick overview of application performance. Still other reports highlight delays in the application response, length of calls and other indicators that may point to issues of user satisfaction.

back to the top

Copyright © 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).