First Words
Welcome to “First Words” – the VoiceXML Review’s column to teach you about VoiceXML and how you can use it. We hope you enjoy the lesson.
VoiceXML 2.1
In this lesson, we’re going to continue investigating VoiceXML 2.1. As I write this, the Voice Browser Working Group of the W3C is working hard on finalization of the VoiceXML 2.1 specification, as part of face-to-face meeting activities in Turin, Italy.
You may recall that as VoiceXML platform vendors and application developers began to widely deploy VoiceXML applications, they began to identify potential future extensions to the language. The result of this experience is a collection of field-proven features that are candidates for addition to the VoiceXML language. These features are being proposed as part of VoiceXML 2.1.
Just as a reminder, VoiceXML 2.1 has been released as a Last Call Working Draft. Here is a pointer:
http://www.w3.org/TR/2004/WD-voicexml21-20040728/
Note: if you’re reading this article after VoiceXML 2.1 has been finalized and published, you should spend a few minutes tracking down the final specification rather than this link, as the specification may have undergone minor changes.
The new features proposed for VoiceXML 2.1 are based on feedback from application developers and VoiceXML platform developers
The features we looked at last issue were:
- Referencing Grammars Dynamically – Generation of a grammar URI reference with an expression;
- Referencing Scripts Dynamically – Generation of a script URI reference with an expression;
Here is a link to the article:
https://voicexmlreview.org/Sep2004/columns/sep2004_first_words.html
In future issues, we’re going to look at these:
- Using <mark> to detect barge-in during prompt playback – Placement of ‘bookmarks’ within a prompt stream to identify where a barge-in has occurred;
- Using <data> to fetch XML without requiring a dialog transition – Retrieval of XML data, and construction of a related DOM object, without requiring a transition to another VoiceXML page.
- Concatenating prompts dynamically using <foreach> - Building of prompt sequences dynamically using Ecmascript;
- Adding type to <transfer> - Support for additional transfer flexibility (in particular, a supervised transfer), among other capabilities.
This issue, we’re going to look at:
- Recording user utterances while attempting recognition – Provides access to the actual caller utterance, for use in the user interface, or for submission to the application server.
- Adding namelist to <disconnect> - The ability to pass information back to the VoiceXML platform environment (for example, if the application wishes to pass results to a CCXML session related to this call).
Recording User Utterances
Collection of user utterances can be useful in a number of ways. This feature allows the application to request that the platform collect these utterances for application use.
Utterance recording is enabled with the ‘recordutterance’ property. When set to ‘true’, the platform will set three shadow variables as part of any input collection:
- recording – a reference to the recorded audio;
- recordingsize – the size of the recording in bytes;
- recordingduration – the duration of the recording in milliseconds;
After any successful input collection, these properties are set on the form item variable – they are always set on the application.lastresult$ object.
Note that support for this feature on the <record> and <transfer> elements is optional (as is speech recognition support when processing these elements).
Here is an example from the VoiceXML 2.1 Last Call Working Draft.
<?xml version="1.0" encoding="UTF-8"?> <vxml xmlns="http://www.w3.org/2001/vxml" version="2.1"> <form> <property name="recordutterance" value="true"/>
<field name="city_state">
<prompt>
Say a city and state.
</prompt>
<grammar type="application/srgs+xml" src="citystate.grxml"/>
<nomatch>
I'm sorry. I didn't get that.
<reprompt/>
</nomatch>
<nomatch count="3">
<var name="the_recording" expr="lastresult$.recording"/>
<submit method="post"
enctype="multipart/form-data"
next="upload.cgi"
namelist="the_recording"/>
</nomatch>
</field>
</form>
< /vxml>
|
In this example, utterance recording is enabled within the scope of the form, by setting the ‘recordutterance’ property to ‘true’. The form is attempting to collect a city and state from the user.
If three ‘nomatch’ events occur (in a row), the matching event handler above (<nomatch count=”3”>) will be triggered. This event handler submits the utterance to the application server, where it presumably would be stored in a file or database.
This example would collect user utterances that are problematic. Note though, that only the last utterance in the sequence of three nomatch events would be saved in this case. Note also the use of the ‘multipart/form-data’ encoding in the submission – this is required by VoiceXML 2.1.
There is an additional related property – ‘recordutterancetype’ – which can be used to define the media type to be used for recording the utterance. Should the requested type not be supported by the platform, an error.unsupported.format event will be thrown.
This capability enables a number of abilities, including application tuning, application-server tier speaker verification, confirmation of caller input for regulatory purposes, among others.
Passing Data Using Disconnect
VoiceXML 2.0 allows an application to return data to the VoiceXML interpreter context using the ‘namelist’ attribute of the <exit> element. This can be useful when one wishes to pass data to other network elements. Depending upon the platform in use, this might include Computer Telephony Integration (CTI) subsystems, Call Control XML (CCXML) interpreters, or other components.
For more information on CCXML, you might want to have a look at:
http://www.w3.org/TR/2004/WD-ccxml-20040430/
As the CCXML specification nears completion, this feature will provide an additional mechanism for communication between the CCXML interpreter and VoiceXML dialogs under its control.
VoiceXML 2.1 adds this capability to the <disconnect> element. This is useful for particular applications and provides a consistent mechanism for providing this data to the interpreter context.
The use of the ‘namelist’ attribute with <disconnect> is very straightforward:
<disconnect namelist=”accountNumber accountType transactionType”/>
In this example, the ECMAScript variables listed in the ‘namelist’ attribute will be passed to the interpreter context. Both the <exit> and <disconnect> elements can be used in a document. In this case, the values from both are passed to the interpreter context for further processing.
Summary
Here are the direct links to these two new features.
http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-disconnect
http://www.w3.org/TR/2004/WD-voicexml21-20040728/#sec-reco_reco
Watch for more information on VoiceXML 2.1 in our forthcoming issues.
VoiceXML 2.1 proposes some useful additional features for VoiceXML 2.0, based on real-world deployment experience. We’re going to continue looking at these in the forthcoming issues drilling down into these features. As always, if you have questions or topics for VoiceXML 2.0 or 2.1, drop us a line!
back to the top
Copyright © 2001-2004 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).
|