VoiceXML Review - Columns

Volume 5, Issue 2 - March/April 2005

First Words

Welcome to “First Words” – the VoiceXML Review’s column to teach you about VoiceXML and how you can use it. We hope you enjoy the lesson.

VoiceXML 2.1

In this lesson, we’re going to continue investigating VoiceXML 2.1.

You may recall that as VoiceXML platform vendors and application developers began to widely deploy VoiceXML applications, they began to identify potential future extensions to the language. The result of this experience is a collection of field-proven features that are candidates for addition to the VoiceXML language. These features are being proposed as part of VoiceXML 2.1.

Just as a reminder, VoiceXML 2.1 has been released as a Last Call Working Draft. Here is a pointer:

http://www.w3.org/TR/2004/WD-voicexml21-20040728/

Note: if you’re reading this article after VoiceXML 2.1 has been finalized and published, you should spend a few minutes tracking down the final specification rather than this link, as the specification may have undergone minor changes.

The new features proposed for VoiceXML 2.1 are based on feedback from application developers and VoiceXML platform developers. The features we’ve covered already include:

Referencing Grammars Dynamically – Generation of a grammar URI reference with an expression;
Referencing Scripts Dynamically – Generation of a script URI reference with an expression;
Recording user utterances while attempting recognition – Provides access to the actual caller utterance, for use in the user interface, or for submission to the application server.
Adding namelist to <disconnect> - The ability to pass information back to the VoiceXML platform environment (for example, if the application wishes to pass results to a CCXML session related to the call)
Using <mark> to detect barge-in during prompt playback – Placement of ‘bookmarks’ within a prompt stream to identify where a barge-in has occurred;

Here are the links to the previous articles in this series:

https://voicexmlreview.org/Sep2004/columns/sep2004_first_words.html
https://voicexmlreview.org/Nov2004/columns/nov2004_first_words.html
https://voicexmlreview.org/Feb2005/columns/Feb2005_first_words.html

This issue, we’re going to look at:

Concatenating Prompts Dynamically using <foreach>

The <foreach> Tag

The <foreach> tag is perhaps one of the most widely implemented extensions to VoiceXML 2.0. As a result of its usefulness, it was selected for inclusion in VoiceXML 2.1. This feature adds a looping construct to VoiceXML.

The primary use-case is constructing a dynamic list of prompts without requiring a trip to the application server. This allows wider use of static VoiceXML pages, particularly when combined with some of the other features in VoiceXML 2.1. This can lead to more efficient applications with a better partitioning between presentation and business logic as well.

The <foreach> tag takes two attributes, and both are required. Otherwise (as is usual) an error.badfetch event will be thrown when processing the page. The two attributes are:

array – an ECMAScript expression evaluating to an ECMAScript array; The loop will be executed for each member of this array;
item – an ECMAScript variable that is used as the ‘loop variable’. For each iteration through the loop, this variable will be set to the current array element being processed.

The VoiceXML 2.1 Last Call Working Draft section on <foreach> has a great selection of example code:

http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-foreach

We’re going to have a look at the first example, which should give you a flavor of how <foreach> can be used.

There are two snippets of code in this example– the first is the VoiceXML component.

<?xmlversion="1.0" encoding="UTF-8"?>

<vxml xmlns="http://www.w3.org/2001/vxml" version="2.1">

   <script src="movies.js"/>

   <form id="pick_movie">
                                           
      <!--
      GetMovieList returns an array of objects
        with properties audio and tts.
        The size of the array is undetermined until runtime.
      -->

      <var name="prompts" expr="GetMovieList()"/>

      <field name="movie">
         <grammar type="application/srgs+xml" src="movie_names.grxml"/>

         <prompt>Say the name of the movie you want.</prompt>

         <prompt count="2">

            <audio>
                 When you hear the name of the movie you want,
                 just say it.
            </audio>

            <foreach item="thePrompt" array="prompts">
               <audio expr="thePrompt.audio">
                  <value expr="thePrompt.tts"/>
               </audio>
               <break time="300ms"/>
             </foreach>

         </prompt>

         <noinput>
            I'm sorry. I didn't hear you.
            <reprompt/>
         </noinput>

         <nomatch>
            I'm sorry. I didn't get that.
           <reprompt/>
         </nomatch>

      </field>
   </form>
</vxml>

The second code snippet, again, straight from the VoiceXML 2.1 working draft, is an ECMAScript function returning an array of Movies that can be requested:

function GetMovieList()
{

   var movies = new Array(3);

   movies[0] = new Object();
   movies[0].audio = "godfather.wav";
   movies[0].tts = "the godfather";

   movies[1] = new Object();
   movies[1].audio = "high_fidelity.wav";
   movies[1].tts = "high fidelity";

   movies[2] = new Object();
   movies[2].audio = "raiders.wav";
   movies[2].tts = "raiders of the lost ark";
        
   return movies;
}

In this function, we’ve created a array consisting of three ECMAScript objects. Each of these objects has two properties – the name of an audio file that contains the (pre-recorded) name of the movie, and a string containing text that can be played in the event that the audio file can’t be found.

Here are the interesting bits from the VoiceXML page:

<var name="prompts" expr="GetMovieList()"/>

This variable declaration calls our ECMAScript movie function, generating the array of objects that will be used in the <foreach> loop, shown below:

<foreach item="thePrompt" array="prompts">
   <audio expr="thePrompt.audio">
      <value expr="thePrompt.tts"/></audio>
      <break time="300ms"/>
</foreach>

If the caller is unsure of what to say, or names a movie not in our grammar file, the <foreach> loop will be processed – this is triggered by the ‘count=”2”’ attribute on the <prompt> enclosing the <foreach> loop. By doing this the second time we prompt, we allow the experienced user to move through their task quickly while providing a fallback in the event that the caller needs help.

What this amounts to is the queuing of the following audio for the second recognition attempt with the user:

<audio src="godfather.wav">the godfather</audio>
<break time="300ms"/>
<audio src="high_fidelity.wav">high fidelity</audio>
<break time="300ms"/>
<audio src="raiders.wav">raiders of thelost ark</audio>
<break time="300ms"/>

You’ll note that we’ve queued up the three pre-recorded audio files, along with their alternate text, and including the embedded pauses.

This is a rudimentary example, that doesn’t show a lot of advantage over just naming the prompts in the original page. You may well ask why we didn’t just generate the page with this list of files initially. But imagine being able to decide how to generate this list of prompts based on user input that has been received earlier within this page (“No action movies for me please”). Or being able to retrieve the movie data (as an XML data object) from within a static VoiceXML page (using the <data> tag), and then being able to construct a list of prompts on the fly, without going back to the application server. And then managing that list of prompts on-the-fly from within the VoiceXML page.

The <foreach> tag can be useful on its own, but when combined with <data> it allows the construction of very powerful pages using only static VoiceXML. For a detailed example of this method in action, have a look at the other examples provided in the VoiceXML 2.1 specification for the <foreach> tag.

Summary

Here is the direct link to the ‘foreach’ tag feature:

http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-foreach

VoiceXML 2.1 proposes some useful additional features for VoiceXML 2.0, based on real-world deployment experience. We’re going to continue looking at these in the forthcoming issues drilling down into these features.

In future issues, we’re going to look at these:

Using <data> to fetch XML without requiring a dialog transition – Retrieval of XML data, and construction of a related DOM object, without requiring a transition to another VoiceXML page.
Adding type to <transfer> - Support for additional transfer flexibility (in particular, a supervised transfer), among other capabilities.

These are features that will likely get a full article each, as they are powerful, and can provide the VoiceXML developer with new ways to build applications. And the astute reader will note that I’ve left the hardest ones for last!

As always, if you questions or topics for VoiceXML 2.0 or 2.1, drop us a line!

back to the top