Volume 1, Issue 10 - November 2001
   
 

The Record Tag

By Rob Marchand

Welcome to First Words, VoiceXML Review's column that teaches you about VoiceXML and how you can use it. We hope you enjoy the lesson.

Last month we talked about using the <transfer> tag to connect your callers to other services or people. This month, we're going to do a little call screening, and try out the <record> tag.

Of course, whenever a call takes place, you have access to telephony-related information about the call. This information includes the dialed number (session.telephone.dnis), the calling number (session.telephone.ani), if available, and possibly additional network information (User-to-user information, UUI, as session.telephone.uui, and Information digits as session.telephone.iidigits). This telephony-related information allows you to tailor your application based on who calling whom, where the call if made from, and so on. UUI can be used as required by the application, and the information digits provide useful information about the call.

The <record> tag allows you to record what the caller is saying, and then to make use of that recorded data. The <record> tag is used as a form item for collecting input within a form, and shares a number of characteristics with other form items such as <field>. The recorded data is available as the form item variable associated with the <record> tag.

This data can then be used as you would expect; it can be played back to a caller, and it can be submitted to a web server. The <record> tag can be useful in many types of applications, as I'm sure you can imagine. Some examples might include Voicemail systems, E-mail by phone, collecting general comments or requests, and so on.

So how do we use this wondrous capability that the <record> tag gives us? Here is a simple example.

<?xml version="1.0"?>
<vxml version="1.0">
  <form>
    <record name="recorded_message" type="audio/wav" maxtime="30s"
    dtmfterm="true">       <prompt>
        Please record something. Press any key to stop recording.       </prompt>       <filled>         <prompt>           You have recorded your message. Here is what it sound like.           <value expr="recorded_message" />         </prompt>       </filled>     </record>   </form> </vxml>    


This example will prompt the caller, and then record their input (up to thirty seconds worth) and then play it back to them as part of another prompt. If the caller wishes to terminate recording, they can press a DTMF key. Although it doesn't matter in this example, we have specified that the file should be saved in WAVE format, as indicated by the content type 'audio/wav'.

As with other form items, the <record> element contains other elements that define the behavior while collecting the data from the user. In this example, we have:

  • <prompt> , which defines the prompt to be played prior to initiating recording;
  • <filled>, which defines the actions to undertake after a successful recording;

The <record> tag in this example also has some interesting attributes:

  • name - the ECMAScript variable by which we can later refer to this recorded data;
  • type - the MIME type for the recorded data;
  • maxtime - the maximum recording time acceptable to our application;
  • dtmfterm - whether or not DTMF key presses can be used to terminate the recording.

The <record> tag can accept some additional attributes as well:

  • expr - As usual, an ECMAScript expression which, when evaluated, is used to initialize the field variable. Note that unless the variable is cleared, the <record> field will never be visited.
  • cond - Again, one of our regular field variable attributes; this is an ECMAScript expression that must evaluate to true in order for this form item to be visited;
  • modal - This attribute determines whether speech grammars are enabled during recording or not. The default value of modal is 'true', which indicates that the grammars are not enabled (like a modal dialog in a windowing system, you interact only with this element until the data input is complete);
  • beep - A Boolean ECMAScript expression which determines whether or not a tone is played to the caller just prior to the start of recording (just like your answering machine!);
  • finalsilence - A time interval specifying the amount of silence that indicates the end of a recording.

This set of attributes allows us to control the collection of our audio data. When the recording has been made, a number of shadow variables are defined:

  • name$.duration - The length of the recording in milliseconds;
  • name$.size - The length of the recording in bytes;
  • name$.termchar - If the user terminates recording with a DTMF key press (assuming, of course, that this has been enabled with the 'dtmfterm' attribute), then this shadow variable contains the DTMF key that the caller used to end the recording.

Here is a slightly more comprehensive example using the <record> tag:

<?xml version="1.0"?>
<vxml version="2.0">
    <form>
        <record beep="true" name="recorded_message" type="audio/wav" maxtime="30s"
        dtmfterm="true">             <prompt>
               Please record something. Press any key to stop recording.             </prompt>             <noinput>                You really should say something.             </noinput>             <filled>                 <prompt>                     You have recorded your message. Here is what it sound like.                     <value expr="recorded_message" />                 </prompt>                 <if cond="recorded_message$.maxtime == 'true' ">                     <prompt>                         You talk too much! Your message was truncated.                     </prompt>                 </if>                 <submit next="/cgi-bin/record.pl" method="post" />           </filled>           <catch event="telephone.disconnect.hangup">                 <submit next="/cgi-bin/record.pl" method="post" />           </catch>        </record>   </form> </vxml>    


Here's what's different:

  • We've added a finalsilence attribute of four seconds; this means that the recording will automatically terminate after four seconds of silence;
  • We've added a <noinput> event handler; note that if only silence is heard, the platform may throw a <noinput> event;
  • We're using one of the shadow variables (maxtime) to tell the user they're being a little too chatty;
  • We're actually submitting the recorded data to a web server, using the <submit> tag;
  • We're catching the telephone.disconnect.hangup event to make sure that we can submit the data even if the caller hangs up. This one is important, as many callers will hang up after having left a message, so we must deal with this situation in out application.

back to the top

 

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).