Volume 1, Issue 1 - January 2001
   
 

What is VoiceXML?

By Kenneth G. Rehor

(Continued from Part 1)

Figure 2 shows the relationship between a traditional Web application, and a voice-enabled Web application.

Figure 2: Relationship Between a Traditional Web Application
and a Voice-Enabled Web Application

A Basic Menu Example

To see how VoiceXML works, let's start with a very simple example of a basic menu. Following the architecture shown in Figure 1, a caller dials the telephone number of this simple voice portal. The call is routed to the VoiceXML telephony server. The appropriate VoiceXML page (in this case menu.vxml) is fetched via HTTP from the application (web) server, and interpretation begins.

Example 1: menu.vxml
 
1    <?xml version="1.0"?>
2    <vxml version="1.0">
3
4        <menu>
5            <prompt> Choose from <enumerate/></prompt>
6
7            <choice next="sports.vxml"> sports </choice>
8            <choice next="weather.vxml"> weather <choice>
9            <choice next="news.vxml"> news <choice>
10        </menu>
11
12    </vxml>


The first line of Example 1 indicates that it complies with W3C's XML version 1.0. Line 2 is the top-level VoiceXML element containing dialogs of either <menu>s or <form>s. This also indicates compliance with VoiceXML version 1.0. Lines 4 through 10 contain a menu consisting of a prompt and three choices. The contents of the <choice> elements are used by the VoiceXML interpreter to instruct the ASR engine what to listen for, in this case the words sports, weather, or news. The content is also used to construct a prompt if the <enumerate> element is included. A speech synthesis engine would render the text as audio.

The user interaction would be as follows:

Computer: Choose from sports, weather, news.
Human: Sports.

The VoiceXML interpreter then fetches the file sports.vxml and the process continues.

But what if the user asked for help, didn't say something appropriate, or said nothing at all? VoiceXML has language elements that allow a dialog designer to handle these circumstances. Here's the same menu example embellished to handle "unexpected" responses:

Example 2: menu.vxml (embellished)
 
1    <?xml version="1.0"?>
2    <vxml version="1.0">
3
4        <menu>
5            <prompt> Choose from <enumerate/></prompt>
6
7            <choice next="sports.vxml"> sports </choice>
8            <choice next="weather.vxml"> weather <choice>
9            <choice next="news.vxml"> news <choice>
10        
11            <help>
12                If you would like sports scores, say sports. 13                For local weather reports, say weather, or
14                for the latest news, say news.
15            </help>
16
17            <noinput>You must say something.</noinput>
18
19            <nomatch>Please speak clearly and try again.</nomatch>
20
21          </menu>
22
23    </vxml>

The user interaction might be:

Computer: Choose from sports, weather, news.
Human: (user says nothing)
Computer: You must say something. Choose from sports, weather, news.
Human: Tblisi
Computer: Please speak clearly and try again. Choose from sports, weather, news.
Human: Help
Computer: If you would like sports scores, say sports. For local weather reports, say weather, or for the latest news, say news.
Human: Sports

Summary

VoiceXML is a powerful, yet simple language for building voice dialogs. It leverages web architecture, tools, and technology to enable innovative new telephone applications. Thanks to the standardization efforts of the VoiceXML Forum and the W3C, it is gaining widespread adoption--especially by the 350-plus members of the VoiceXML Forum. New language features in the recently published draft of VoiceXML 2.0, and new call control features currently under development, promise an even richer voice-enabled Web.

back to the top

 

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).