VoiceXML Review - Columns

In the first section, we mentioned that VoiceXML brings together some interesting technologies. Let's bring a few more of these components into the mix. In Example 3, we'll actually prompt the user, collect some information, play it back to them, and then submit it to a Web server for further processing.

Example 3: Advanced Hello World
 
<?xml version="1.0"?>
<vxml version="1.0">
      <!--Example 3 for VoiceXML Review -->
      <form>
            <block>
                  <prompt>
                        <audio
src="http://www.voicexml.org/audio/helloworld.wav">
                        Hello, World!
                        </audio>
                  <prompt>
            </block>

            <field name="greeting">

                  <prompt>
                        What say you?
                  </prompt>

                  <grammar>
                        hello | howdy | greetings | hey | password
                  </grammar>

                  <help>
                        You can say hello, howdy, greetings, hey, or
                        password
                  </help>

                  <filled>
                        You said <value expr="greeting"/>
                  </filled>

            </field>

            <block>
                  <!--Decide whether to continue talking to this caller -->
                  <submit next="http://www.voicexml.org/
                        cgi-bin/friend_or_foe.cgi"/>
            </block>

      </form>
</vxml>

There are a number of new events occurring in this example. The form now contains a field, or an item of data to be collected from the user. This field has a number of components:

Prompt elements can contain URIs that refer to text or pre-recorded audio (among other things) that can be played to the user to indicate that input is required.

Grammar elements set parameters on what the user can say. In the case of Example 3, acceptable input consists of one of the words "hello", "greetings", "hey", "password", or "howdy". If the user does not provide one of these inputs, they will be given (according to Example 3) a system-dependent message indicating that they did not respond, or that their response was not understood. The user is then re-prompted, and is given another chance to provide input. Handling of conditions such as no input can be customized in many different ways. If the user says 'help', then the content in the help element will be played.

Once input gathering is successful, the actions specified in the <filled> element are then processed. In the case of Example 3, the user is told what he or she has said. The processing of this field is now complete, (as is collection of input for this form). The processing of elements within a form is clearly defined by the VoiceXML Form Interpretation Algorithm (FIA).

The form element contains one sub-element in addition to the field. The submit block packages the collected data and submits it to a Web server for further processing. The underlying mechanism for this is exactly the same as submitting an HTML form from a visual Web browser. The invoked CGI program would presumably decide how to proceed based on the user input thus far. So a sample conversation might appear like:

Computer: Hello world!
Computer: What say you?
Human: Foo
Computer: I do not understand
Computer: What say you?
Human: Help
Computer: You can say 'hello', 'howdy', 'greetings', 'hey', or password'.
Computer: What say you?
Human: Howdy
Computer: You said Howdy
[...form is submitted to Web Server...]

VoiceXML Developer Resources

The VoiceXML Forum has developed a number of resources that will allow those new to VoiceXML to get started. Here are a few pointers:

A number of VoiceXML Forum Members provide access to developer sites and tool kits that will allow you to try out VoiceXML for yourself. A few of these are:

VoiceXML: Where Speech Meets the Web

By Rob Marchand

Where's the Beef?

VoiceXML Developer Resources