Volume 2, Issue 1 - January 2002
   
 

Let's Talk Office Supplies

By Dr. George White and Pilar Manchón

(Continued from Part 1)

NetByTel

An early success story is the "voice automation" of Office Depot, which with assistance from NetByTel, transformed several of its call center functions. If you call 1 800 GO DEPOT you can talk to a speech recognition system that will tell you where the nearest Office Depot stores are located. It will also allow you to check your order status and even order items from an Office Depot catalog. This is the tip of the iceberg for "self-service" offered on the telephone.

Despite years of disappointment for commercial applications of automatic speech recognition, voice automated, "self-service" telephony applications are a resounding success. Office Depot achieved significant cost reduction in handling some types of calls with automatic speech recognition systems. This application revealed a surprise benefit. It was discovered that a significant portion of callers preferred to use the automated system rather than talk to a live agent. We think this was a reaction to "hold" times waiting for a live operator when "virtual operators" were immediately available.

What follows here is an example of VoiceXML written to implement a typical Office Depot application. There are instructions regarding what audio files to play and in what order, where to listen for a response and what responses to look for, and what to do once it has a response.

<?xml version="1.0"?>
<!-- Office Depot Main Menu -->
<vxml version="1.0" application="ODMN.vxml">

<meta name="maintainer" content=""/>
<meta name="application" content="Office Depot Main Menu"/>

<form id="Welcome" scope="dialog">
    <block>
        <audio src="nbtsound.au"/>
        <prompt>Welcome to Office Depot Main Menu</prompt>
        <goto next="#InitialMenu"/>
    </block>
</form> 

<form id="InitialMenu" scope="dialog">
    <field name="IniMenuAnswer">
        <grammar src="InitialMenu.grxml"/>

        <!-- PROMPTS -->
        <prompt>
        Would you like to hear the status of your order, find a location
        or connect to our order line
        </prompt>
        <prompt count="2">
        What would you like: Order status, Location Finder or Order Line?
        </prompt>
        <help>
        You have reached the Office Depot Main Menu. I can connect you to any of
        our automated systems: to hear about an order you have already placed, say
        'Order Status'. To find one of our stores, say 'Location Finder'. To order products
        from our catalog, say 'Order Line'.
        </help>
        <!-- END PROMPTS -->

        <filled>
            <if cond="IniMenuAnswer == 'Order_status' ">
                <prompt>Order status! </prompt>
                <goto next="http://pmanchon.netbytel.com/OD/OS.vxml"/>
            <elseif cond="IniMenuAnswer == 'Location' "/>
                <prompt>Location Finder! </prompt>
                <goto next="http://pmanchon.netbytel.com/OD/LF.vxml"/>
            <elseif cond="IniMenuAnswer == 'Order_Line' "/>
                <prompt>Order Line! </prompt>
                <goto next="http://pmanchon.netbytel.com/OD/OL.vxml"/>
            <elseif cond="IniMenuAnswer == 'Location' "/>
                <prompt>Location Finder! </prompt>
                <goto next="http://pmanchon.netbytel.com/OD/LF.vxml"/>   
           </if>
        </filled>
    </field>
  </form>
</vxml>      

 

The example above illustrates what a few lines of VoiceXML can accomplish. It instructs the voice browser to:

1. Answer the call
2. Play the welcome prompt (in this case, all prompts are synthesized)
3. Play the first prompt
4. Wait for an answer
5. Match the response with the corresponding option
6. Go to the next document

However, these are just the major blocks. There are many other instructions embedded in those few lines of code covering, for instance, what to do in case of misrecognition, or when to listen for what responses. Other customizable features may or not be set by the VoiceXML application itself, but by the specific configuration of the voice browser. The key factor here is that there is a great deal of functionality built into the browser that the developer can use to standardize as well as speed up the development process.

While this particular example is "static," VoiceXML documents can be generated on the fly to collect or provide whatever pieces of information are relevant to a particular customer, at that particular time. Dynamic VoiceXML is extremely important in Voice Commerce, since it provides real-time access to changing data in dynamic databases. It also provides developers greater flexibility, which results in the following:

a. Greater personalization
b. Reusable code
c. Polite and efficient interactions.
d. Higher quality Natural Language interaction

Dynamic VoiceXML enables applications to respond more intelligently to callers by using the information either gathered during the course of a conversation, or retrieved at run time from a different source. What would you think of an operator who despite answering thousands of calls a day, remembers your needs and makes sure you get what you need effortlessly and at your own pace? Focusing on the user's experience, designers can successfully emulate one-to-one personalized interaction between the caller and the system.

The fact that VoiceXML makes it easy to design voice applications, does not mean, however, that anybody can write high quality voice applications. There is much more to designing a voice application than simply putting VoiceXML elements together. Designers must have a deep understanding of how the technology works, its potential as well as its limitations, in order to balance robustness, quality and reliability with efficiency and natural language interaction. It is easy to think that anyone could 'design' a dialog, since we all do that naturally every day. However, the challenge comes when trying to guess what callers might say when prompted for information: the designer must explicitly account for all possibilities. Well-worded prompts are essential in high quality applications, and good linguistic skill is needed. In addition, a global sense of dialog flow, wording and prompt intonation must be skillfully managed for consistency. It is surprisingly easy to generate a logically correct flow that sounds silly, repetitive or even patronizing, resulting in a low quality application and frustrated users.

There are many factors developers must consider before designing an application. A good example is the fact that people do not interact with computer systems the same way they do with other people. It is also a fact that different demographic populations react differently to automated systems, therefore applications aimed at different age ranges, or different cultural levels would probably differ significantly. These are only examples of factors that must be taken into account when designing an application: some are related to what the technology can and cannot do, and some are related to how people react given a context that may vary accordingly. How to make the current technology emulate a human-like experience is in itself a science that combines expertise in different disciplines: linguistics, psychology, cognitive and computer science, speech technology and human factors in general. The ultimate goal of user-centered applications is to make the caller feel safe, comfortable, acknowledged, appreciated and in control. This is particularly important in e-commerce, where applications collect and handle sensitive information.

One of the goals of well-crafted VoiceXML applications is to create "personalities" for virtual operators. Such virtual operators would be programmed to always be polite, alert, and efficient while performing the most boring and monotonous tasks!

In closing, we note that the advantages of polite "virtual operators" are especially noticeable during the holidays, when everybody is trying to do their holiday shopping: all the lines are busy, customers need to hold for several minutes, operators are stressed and cannot keep up the service level, the interaction becomes less and less pleasant as everybody is trying to cope with everybody else& So for such occasions, we sing praises to VoiceXML systems for not making us wait, and for addressing us with the same friendly manner, no matter how flustered we are when we call!

back to the top

 

Copyright © 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).