VoiceXML Review - Columns

Volume 4, Issue 3 - September / October 2004

First Words

Welcome to “First Words” – the VoiceXML Review’s column to teach you about VoiceXML and how you can use it. We hope you enjoy the lesson.

VoiceXML 2.1

As promised last issue, we’re going to start learning about VoiceXML 2.1. You may recall that as VoiceXML platform vendors and application developers began to widely deploy VoiceXML applications, they began to identify potential future extensions to the language. The result of this experience is a collection of field-proven features that are candidates for addition to the VoiceXML language. These features are being proposed as part of VoiceXML 2.1.

For those keeping score, VoiceXML 2.1 has just (as of this writing) been released as a Last Call Working Draft. We encourage you to have a look:

http://www.w3.org/TR/2004/WD-voicexml21-20040728/

To review, the new features proposed for VoiceXML 2.1 are based on feedback from application developers and VoiceXML platform developers. Those features proposed as part of VoiceXML 2.1 include:

Referencing Grammars Dynamically – Generation of a grammar URI reference with an expression;
Referencing Scripts Dynamically – Generation of a script URI reference with an expression;
Using <mark> to detect barge-in during prompt playback – Placement of ‘bookmarks’ within a prompt stream to identify where a barge-in has occurred;
Using <data> to fetch XML without requiring a dialog transition – Retrieval of XML data, and construction of a related DOM object, without requiring a transition to another VoiceXML page.
Concatenating prompts dynamically using <foreach> - Building of prompt sequences dynamically using Ecmascript;
Recording user utterances while attempting recognition – Provides access to the actual caller utterance, for use in the user interface, or for submission to the application server.
Adding namelist to <disconnect> - The ability to pass information back to the VoiceXML platform environment (for example, if the application wishes to pass results to a CCXML session related to this call)
Adding type to <transfer> - Support for additional transfer flexibility (in particular, a supervised transfer), among other capabilities.

We’re going to peek at the first two in this issue. These, along with several of the other features, provide increased ability to process within the VoiceXML page itself, rather than having to regenerate a VoiceXML page from the application server. For more information on how to generate dynamic VoiceXML, have a look at the following First Words columns:

https://voicexmlreview.org/Jun2001/columns/Jun2001_first_words.html
https://voicexmlreview.org/Nov2001/columns/Nov2001_first_words.html

In VoiceXML 2.0, both grammars and scripts are placed either in-line (within an XML <grammar> or <script> element, respectively), or can reference a URL identifying the data to be used for the grammar or script. The URL in both cases is specified with the ‘src’ attribute, containing a static value.It is useful, however, to have the capability to select a grammar or script based on the evaluation of an expression. The following example, directly from the VoiceXML 2.1 Last Call Working Draft, illustrates the use of this feature in action (for the case of the <grammar> element). Note the two highlighted <grammar> elements:


<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" version="2.1">

  <form id="get_address">
    <field name="citystate">
      <grammar type="application/srgs+xml" src="citystate.grxml"/>
      <prompt>Say a city and state.</prompt>
    </field>

    <field name="street">
      <grammar type="application/srgs+xml" srcexpr="citystate + '.grxml'"/>
      <prompt> What street are you looking for? </prompt>
    </field>

    <filled>
      <prompt>
       You chose 
       <value expr="street"/>
       in 
       <value expr="citystate"/> 
      </prompt>
      <exit/>
    </filled>
  </form>

</vxml>

The first field in the example collects a city and state using a conventional static URL reference to a grammar. The grammar is identified using the ‘src’ attribute.

The second field then uses this response in an expression to select the grammar to be used for the second field collection. The expression simply takes the first recognition result and then appends “.grxml” to the returned value. For example, recognition for the utterance “Toronto Ontario” might return the semantic interpretation “TorontoOntario” (I know, it’s not a state, but I’m Canadian ?). The ECMAScript expression that is contained in the ‘srcexpr’ attribute would then evaluate to “TorontoOntario.grxml”. This URL is then used to fetch the grammar for the second field. (You may recall that the “grxml” filename extension is the convention used for SRGS format grammar files.

A sample of the same technique with the <script> tag is shown below. The ‘user_id’ variable is used in the expression to dynamically construct an ECMAScript URL reference. Notice that the URL in this case contains a query component, to be used by the server to pass a parameter to the ‘passport’ program (which might be a script, servlet, or some other CGI compatible program in this case).


<?xml version="1.0" encoding="UTF-8"?>
<vxml xmlns="http://www.w3.org/2001/vxml" version="2.1">
  <form>
    <var name="user_id" expr="12345"/>
    <script srcexpr="'http://example.org/passport?id=' + user_id"/>
  </form>
</vxml>

Error handling is the same for both features. Exactly one of "src", "srcexpr", or an inline grammar or script must be specified; otherwise, an error.badfetch event is thrown.

These two new features allow the author to accomplish more dynamically within a VoiceXML page, and help support the development model where the focus is more on static VoiceXML pages, as opposed to placing more processing on the application server. A number of the other features that we’ll review in the next few issues are directly related to this development model as well, and provide some exciting capabilities to the developer. These features also make the language somewhat more consistent, as URLs can be generated from expressions in a number of other areas as well

Here are the direct links to these two new features.

http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-grammar_expr
http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-script_expr

Watch for more information on VoiceXML 2.1 in our forthcoming issues.
Summary

VoiceXML 2.1 proposes some useful additional features for VoiceXML 2.0, based on real-world deployment experience. We’re going to continue looking at these in the forthcoming issues drilling down into these features. As always, if you questions or topics for VoiceXML 2.0 or 2.1, drop us a line!

back to the top