VoiceXML Review - Columns - Speak & Listen

Volume 2, Issue 2 - Feb./March 2002

By Jeff Kunins

(Continued from Part 1)

Q: When doing recognition in a VoiceXML app, how can I access the recognizer's n-best list?

A: VoiceXML 2.0 provides support for examining what's called the "n-best" list. Voice recognition engines work by comparing what the caller said to the list of active grammars, and deciding how confident it is that the caller said a particular matching value. The recognizer actually computes this confidence interval for several of the most likely possibilities, and then selects the one with the highest confidence value as a match (if it's more confident than the current setting of the confidence property).

For richer programmatic control, you can examine the read-only application.lastresult$ array--- it is always available at application scope. Each element of the array contains information about one of the highest-confidence possible matches for the last attempted recognition. If no recognition has happened yet in the application or when inside of an application root document, its value is ECMAScript undefined.

Each element of the array contains a set of properties and values, which are as follows:

Variable	Description
application.lastresult$	Application-scope array of elements containing information about the last recognition to occur in the current application. Array contains one element for as all possible interpretations with confidence levels greater than the current setting of the confidence property. If no recognition has happened yet in the application or when inside of an application root document, its value is ECMAScript undefined. Each element of the array application.lastresult$[i] is an ECMAScript object with the set of properties described below. After each recognition attempt, the array is reinitialized and contains (1 <= i <= maxnbest) elements, where i is the number of elements and maxnbest is the current value of the maxnbest property. The elements are sorted in confidence order from greatest to least. application.lastresult$.confidence et al. can also be referenced directly as shorthand for application.lastresult$[0].confidence, etc.
application.lastresult$[i].confidence	Float specifying the recognizer's confidence that the caller actually said this particular match, expressed on a scale from 0.0 (minimum) to 1.0 (maximum).
application.lastresult$[i].utterance	String of words actually said by the caller. The exact spelling is platform-specific (e.g. "five hundred fifty" vs. "5 hundred 30" vs. "530").
application.lastresult$[i].inputmode	String either "dtmf" or "voice", indicating whether the grammar was matched via touch-tone or spoken input.
application.lastresult$[i].interpretation	ECMAScript representation of the recognizer's interpretation of the utterance as matched by the relevant grammar, including slot information. At this point, the exact method for representing interpretation results in VoiceXML remains a matter of annotated discussion within the various specifications under the auspices of the W3C Voice Browser Working Group. For detailed information on how to use interpretation results within VoiceXML, please see http://www.w3.org/voice and your vendor's system documentation.

Note: This content is excerpted in part from VoiceXML: Strategies and Techniques for Effective Voice Application Development with VoiceXML 2.0 by Jeff Kunins and Chetan Sharma (Wiley & Sons).

back to the top

More Questions on VoiceXML

By Jeff Kunins