VoiceXML Review - Columns

Volume 1, Issue 11 - December 2001

So, What's New

By Rob Marchand

Welcome to First Words, VoiceXML Review's column that teaches you about VoiceXML and how you can use it. We hope you enjoy the lesson.

Last month we had a close look at the <record> tag; this time, we're going to touch on some of the things that you can look for in VoiceXML 2.0, and how it impacts some of the VoiceXML pages we've done in the past.

Some History

The VoiceXML Forum founders (AT&T, Motorola, IBM, and Lucent) prepared the original VoiceXML 1.0 Specification. It was then passed over to the W3C Voice Browser Working Group to be evolved into VoiceXML 2.0. It was released as a public working draft on October 23rd of this year, with public comments being accepted until November 23rd . The process moving forward will include (possibly) additional working drafts, followed by a 'Last Call' working draft. Finally, a 'candidate recommendation' will be made available for final comment, followed by the formalization of VoiceXML 2.0 as a W3C Recommendation. There is still substantial work to go through in moving VoiceXML 2.0 through the W3C process, but the specification itself should now include most substantive changes and features that will be considered for the 2.0 recommendation.

So, What's New?

The current working draft of VoiceXML 2.0 improves on the VoiceXML 1.0 specification in a number of ways. If you're developing on any of the publicly available developer systems, you probably already have access to these features, or at least some of them.

The information presented here is pulled from Appendix J of the current working draft, available at http://www.w3c.org/Voice.

Logging

The new <log> tag has been added to allow generation of debugging information. The <log> tag takes a 'label' attribute, specifying the purpose of the log message, and an ECMAScript expression ('expr') attribute, which evaluates to the message to be logged.

<dtmf> tag Superseded

The <dtmf> tag, used for specifying DTMF grammars in VoiceXML 1.0, has been replaced by the <grammar> tag using 'mode="dtmf"'. Other behavior remains the same (although DTMF grammars fall into the realm of the new grammar specification; see below).

Speech Markup

The Speech Markup tags (<emp>, <div>, <pros>, and <sayas>) have now been replaced with equivalent elements defined in a separate specification, the Speech Synthesis Markup Language specification. This is another specification being produced by the W3C. The SSML definition will provide rich standard support for speech markup, which can be used to support VoiceXML or other environments using speech markup.

Grammars

The subject of grammars, left to platform and ASR implementers in VoiceXML 1.0, is now being formalized in the Speech Recognition Grammar specification. This is a very big step towards truly interoperable VoiceXML platforms. The SRGF currently requires XML grammar format support, and will recommend ABNF grammar support as well.

Cache Control

The original cache control attribute 'caching' (set to either 'safe' or 'fast') has been replaced by a pair of new attributes ('maxage' and 'maxstale'). Jeff Kunins has provided a detailed description of the caching behavior in last month's Speak & Listen column, so I won't repeat it here. The new attributes give you finer grained control over cache control, and map directly onto HTTP 1.1 caching functionality.

Internationalization

International language and character set support is being brought into line with other XML recommendations. For example, the <vxml> tag now accepts the 'xml:lang' attribute rather than 'lang'. This mechanism is also used to control language support in the Speech Markup and Speech Grammar format specifications.

Speech Recognition Enhancements

Access to more advanced speech recognition capabilities has been provided:

N-best support is now included;
Weights are supported for grammars;
An application variable 'application.lastresult$ is available to allow access to information related to the last recognition;
Support for different types of barge-in detection;

Telephony Changes

There have been a few changes to the telephony model:

session.uui has been changed to session.telephone.uui (this was a typo);
session.telephone.rdnis and session.telephone.redirect_reason have been added;
Transfers can include a 'transferaudio' attribute, which will play until the call is answered;

A future revision may abstract telephony from VoiceXML entirely.

Event Features

Events have been given the capability to pass a message from the event location to the event handler. This allows a <throw> to specify a message (using either 'message' or 'messageexpr'), and the <catch> handler to use this message with the variable '_event'.

Miscellaneous Changes

Some additional changes include:

'modal' has been removed from <subdialog>;
'class', 'mode', and 'recsrc' have been removed from <value>;
'accept' has been added to <menu> and <choice>;
Extraction of TTS and grammar specific data to their respective specifications;
'expr' is now officially part of <audio>;

Better error reporting, including HTTP error codes, telephony protocol errors, and others;

There have been many clarifications (this was one of the major goals of 2.0). These are too numerous to list here, but are detailed in Appendix J of the VoiceXML 2.0 working draft.

We haven't covered all the changes here, but this should give you an idea of what features and changes are being included.

What's Next?

Next month, we'll try to cover some additional features of VoiceXML, and get back to our thriving pizza franchise.

back to the top