VoiceXML Review - Columns - Speak & Listen

Volume 5, Issue 4 - July / Aug 2005

In this monthly column, an industry expert will answer common questions about VoiceXML and related technologies. Readers are encouraged to submit questions about VoiceXML, including development, voice-user interface design, and speech technology in general, or how VoiceXML is being used commercially in the marketplace. If you have a question about VoiceXML, e-mail it to and be sure to read future issues of VoiceXML Review for the answer.

By Matt Oshry

Q: I've been tasked with presenting the user with a list of items. When they hear an item in which they're interested they should be able to say "tell me more" to obtain more information about the item. How might I implement that?

A: There are several ways to implement a 'pick list' in VoiceXML. If you have access to an interpreter that implements the features described in the VoiceXML 2.1 specification, you can use a combination of the and tags to implement a pick list.

The following example declares a list of fruits and vegetables in the ECMAScript array aItems. Each item in the array is an ECMAScript object with a few properties - 'id', 'name', and 'detail'. The 'name' and 'detail' properties are use for queuing prompts in the 'picklist' and 'details' dialogs. The 'id' property uniquely identifies the item and is used to name the that is executed for that item. We also use the id in the of the 'item' to determine which item was selected.

A pause is included between each item to give the user the opportunity to say the magic phrase, 'tell me more' which causes the interpreter to execute the 'details' dialog.

Q: Your example only shows text-to-speech. What if I want to improve the quality of my application by playing back recorded audio for each item in the list?

A. Because the array aItems consists of objects, you can do one of the following: a) Extend each object to include properties that store the URIs to the recorded audio for that item. b) Use the existing 'id' property not only to name the but also to represent a portion of the URI to the recorded audio files for each item.

The latter is appropriate if the recordings are located in a common location and you have control over the naming of those recordings. Let's consider the latter option in more detail.

Let's say the audio is located at the following base URI: http://audio.acmegrocer.net/produce/

For eggplant, the id is 'i0', and we need two recordings - one for the name and one for the detail. We'll name the corresponding files 'i0_name.wav' and 'i0_detail.wav'.

To reference the name recording from the item of the picklist dialog, we augment

The TTS emitted by the execution of the tag is only played if the recording can't be fetched.

Updating the action of the 'details' dialog to use name and details recordings is left as an exercise to the reader. (Or you can cheat by looking at the revised example below.)

Q: How do I allow the user to say 'stop' at any time to terminate playback of the list.

A.You simply extend the itemRules grammar to include another keyword, and add additional code in the

View or download the code shown above (.zip file).

back to the top