VoiceXML Review - Feature Articles

Volume 3, Issue 3 - May/June 2003

Choosing the Right Test Method

By Peter Leppik

Continued from page 1...

"Wizard of Oz" Testing
"Wizard of Oz" testing is common with speech recognition applications in the early stages of design. This involves using a human to play the part of the speech recognition computer, as a way of testing design prototypes before any actual programming is done.

Wizard of Oz can provide good qualitative data for providing direction for refining the application design, but it cannot provide statistical data, given the very small number of callers typically used (generally 25 or fewer). A Wizard of Oz test will provide ideas for improving an application, but it can't tell you if the new application is better or worse than an earlier version, or how it stacks up against industry norms.

Technical Benchmarking
Technical benchmarking, or comparing statistics generated by a call center system against published industry norms, is another common technique for evaluating a system. It has the advantage of using data which is already being generated by existing systems, such as average hold times, and call abandon rates, so the only additional expense is buying benchmark statistics from a third party.

Unfortunately, these statistics are often promoted as a measure of caller satisfaction, when they are really proxies at best. For example, it is clearly good if the average hold time is shorter than industry norms, but that doesn't mean customers are being well served.

In the worst case, relying too heavily on technical benchmarking can lead to a customer service operation managing to the numbers, rather than managing to customer service. For example, call center agents under pressure to reduce their average call time have been known to abruptly hang up on callers with difficult problems. That certainly will reduce average call length, but at the expense of customer satisfaction.

In addition, while technical benchmarking can help decide if a system is performing poorly, it can't tell you if a replacement system will be any better, since it is only meaningful once a system is rolled out.

Focus Groups
Focus groups, intensive interviews with small numbers of customers, are similar in many ways to Wizard of Oz testing. This method can generate a lot of ideas for improvement, and qualitative feedback about a new or existing customer service operation, but it can't generate statistically valid or comparative data.

Employee Test Calls
Employee test calls are a very common method for testing new automated systems. This involves having employees call into an application, and provide feedback for improvement (often through a survey). It has the sole advantage of being fast and cheap.

Unless the system is intended to be used by employees (for example, an HR hotline), this method can actually be worse than doing no testing at all.

The problem lies in the fact that employees are a very different group of people than customers. Employees are familiar with the jargon and processes of the company and industry, where customers generally are not. We have experience with several companies which successfully tested new applications using employee calls, yet found that the expensive new system completely failed to serve the needs of real customers.

As a result, we strongly recommend that companies not rely on employee calls to test a new system.

Continued...

back to the top

Copyright © 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).