VoiceXML Review - Columns

Volume 2, Issue 7 - November/December2002

VoiceXML Events

By Rob Marchand

Welcome to "First Words" - the VoiceXML Review's column to teach you about VoiceXML and how you can use it. We hope you enjoy the lesson.

Handling Complex Recognition Results

One of the changes in the April release of the VoiceXML 2.0 working draft was the formalization of how recognition results would be made available at the VoiceXML level. We're going to spend some time looking at how this impacts your VoiceXML application, as part of the next few articles.

We've written pages using simple recognition results in the past. These include samples like this:

<field name="command">
     <grammar xml:lang = "en-US" version = "1.0" root = "Help">
          <rule id = "Help" scope = "public">
               <one-of>
                    <item> help </item>
                    <item> save me </item>
                    <item> succour </item>
               </one-of>
          </rule>
     </grammar>

In this example, the VoiceXML variable 'command' will receive the raw utterance value - one of the three acceptable utterances in this case, 'help', 'save me' or 'succour'. This is a straightforward example.

More advanced grammars can explicitly fill 'slots' (variables) to return one or more values from a grammar. Here is the previous example modified to return an interpretation of the user utterance in a slot with the name 'returnvalue'.

<field name="command">
     <grammar xml:lang = "en-US" version = "1.0" root = "Help">
          <rule id = "Help" scope = "public">
               <one-of>
                    <item> <tag> returnvalue="help" </tag> help</item>
                    <item> <tag> returnvalue="help" </tag> save me</item>
                    <item> <tag> returnvalue="help" </tag> succour</item>
               </one-of>
          </rule>
     </grammar>

In this example, we have the ECMAScript variable 'returnvalue' receiving the value 'help' in all three cases - regardless of which of the three legal user utterances are recognized. Using techniques like this can simplify your application, and will in general make your grammars more usable and extensible.

The exact format of what is inside the <tag> element will vary from platform to platform right now (the Semantic Interpretation specification is still being developed, as are techniques for mapping these results into VoiceXML), but the contents will likely be a variant or subset of ECMAScript. Consult your ASR or platform vendor for details.

The exact format of what is inside the <tag> eleme Before we move into complex combinations, we need to establish how the simple cases work. From the VoiceXML specification:

If the interpretation is a simple result, this is assigned to the input item variable.
If the interpretation is a structure and the slot name matches a property, this property is assigned to the input item variable.
Otherwise, the full semantic result is assigned

The 'interpretation' is returned to the VoiceXML interpreter from the recognizer. The interpretation allows the recognizer to assign some meaning to the results rather than simply providing the raw utterance to the user of the recognizer. The interpretation is actually provided to the VoiceXML application as an ECMAScript object. This has some implications, as we'll see later.

In our first example, where we don't fill a slot, the interpretation will be a simple result - just a string representing the utterance. This means that the actual user utterance will be assigned to the field variable 'command'.

In the second example, since we actually return a slot, the interpretation will take the form of an ECMAScript object, something like:
{ returnvalue: "save me" }
depending upon the actual user utterance, of course.

According to the second and third rules above, however, we wouldn't get the result we possibly expect at the VoiceXML level. The entire interpretation would be assigned (as an object) to the field variable (a reasonable alternative to this would be to assign the value to the variable in the event that only a single slot is returned, and this was commonly done prior to definition of this behavior in the specification). In order to access the value of interest, we would need to reference a component of the object, using the ECMAScript convention for referencing an object property:

Each slot returned by a grammar would be available in this manner.

As an alternative, we can specify the slot of interest in the field tag:

<field name="command" slot="returnvalue">
     <grammar xml:lang = "en-US" version = "1.0" root = "Help">
          <rule id = "Help" scope = "public">
               <one-of>
                    <item> <tag> returnvalue="help" </tag> help</item>
                    <item> <tag> returnvalue="help" </tag> save me</item>
                    <item> <tag> returnvalue="help" </tag> succour</item>
               </one-of>
          </rule>
     </grammar>

In this case, the property in the interpretation would be assigned to the field variable 'command' from the interpretation property 'returnvalue', as the slot name matches the property name.

We could have achieved the same result by changing the name of the field to match the single slot being returned by the grammar (returnvalue). This can only done once with the same grammar in the same scope, however.

Summary

We've had a quick look at how recognition results are passed back to your VoiceXML application in different situations, and how you can access them.

Suppose, however, that we have a form-level grammar, which can possibly fill more than one slot (and hence populate multiple fields from a single utterance). How do we map the results from the grammar into VoiceXML variables? There are a number of issues that arise when considering this case, and the authors of the most recent versions of the VoiceXML specification have carefully specified how the results returned from grammars will be used. Next month, we're going to look at more complex results and how they can be used in your application.

Best wishes for a safe and happy holiday season from everyone here at the VoiceXML Review.

VoiceXML Users Group Call for Participation

The VoiceXML Forum is beginning to prepare for the Spring Users Group Meeting, to be held in conjunction with the AVIOS Speech Developers Conference and Expo, from March 31st to April 3rd 2003, at the Fairmont Hotel in San Jose California. The VoiceXML Users Group Meeting will be held on April 3rd.

In the past, the VoiceXML User Group has provided tutorials, technology overviews, and other such features to allow technology leaders to become familiar with speech technologies. VoiceXML is clearly now in the mainstream of speech application development. So for the Spring Meeting, the VoiceXML Forum is looking to the VoiceXML user community to share its experience to-date by provide live demos of their VoiceXML Technologies and/or sharing practical feedback on their experiences with VoiceXML.

Some possible topics would include live demos and/or experience reports in the areas of :

· Writing Portable VoiceXML Applications;
· Speech Application Development;
· VoiceXML Platforms;
· Speech Application Tuning;
· Deployment concerns;
· Grammar development;
· Systems Integration Issues.

Or pick another topic related to VoiceXML and the real world. Take this opportunity to pass along your successes (and failures!) and help the industry to evolve.

If you would like to participate in this UGM, by presenting a demo of your VoiceXML application or related technology or share your VoiceXML experiences, please submit a short abstract on the topic you would like to present to by February 21, 2003.

back to the top