VoiceXML Review - Columns - Speak & Listen

Volume 4, Issue 2 - March/April 2004

In this monthly column, an industry expert will answer common questions about VoiceXML and related technologies. Readers are encouraged to submit questions about VoiceXML, including development, voice-user interface design, and speech technology in general, or how VoiceXML is being used commercially in the marketplace. If you have a question about VoiceXML, e-mail it to and be sure to read future issues of VoiceXML Review for the answer.

Q. I notice that VoiceXML 2.1 specifies support for the Document Object Model (DOM) via the new <data> tag. As a VoiceXML programmer, the DOM is completely new to me. What's the best way for me to ramp up?

A. Since all DOM activity is managed by the W3C, the primary resource for the DOM
is http://www.w3.org/DOM/.
To learn the DOM API, the "source of truth" is the DOM specification, also located on the W3C site: http://www.w3.org/DOM/DOMTR.
You'll notice that the DOM has gone through several iterations since its inception:

Level 1, W3C Recommendation, October 1998:
http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/
Level 2, W3C Recommendation, November 2000:
http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/
Level 3, W3C Recommendation, April 2004:
http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/

For the purposes of VoiceXML, you'll want to focus on DOM Level 2 which contains a number of improvements over Level 1 such as XML namespace support (http://www.w3.org/TR/REC-xml-names/). Further, you can limit your study to the read-only subset of methods and properties enumerated in Appendix D of VoiceXML 2.1.

Fortunately for you (and me), the DOM has been around since 1998, shortly after XML itself became a full W3C recommendation, so if the DOM Level 2 specification seems a little intimidating, there's a wealth of resources including tutorials, tools, and sample code available all over the Web to get you started programming the DOM. After all, the best way to learn the DOM is to start writing real code
that exercises the DOM.

You'll find implementations in most popular programming languages on most implementation platforms including Java, C/C++, C#, Perl, and JavaScript. Since VoiceXML and ECMAScript are tightly integrated, VoiceXML programmers will get the most use out of implementations of the DOM exposed through JavaScript. Here are two:

Microsoft MSXML
Microsoft implements a set of XML services including the DOM via the Component Object Model (COM). If you're running a reasonably recent version of Microsoft Windows, and you have Internet Explorer 5 or later installed, you've already got MSXML. If not, you can download it from the following URL: http://msdn.microsoft.com/XML/XMLDownloads/default.aspx.
Once installed, you can create an instance of the DOM within an HTML page rendered in IE 5 or later using the <xml> tag (http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/xml.asp).
You can also create an instance of the DOM independent of IE using the Microsoft Windows Scripting Host (WSH). I'll show you how below.

Mozilla.org
Mozilla.org is a software foundation that fosters the open source development of a number of Internet-powered products and technologies including the Mozilla Web Browser, Firefox (http://www.mozilla.org/products/firefox/).
Firefox integrates a DOM implementation which cannot only be used to manipulate the current HTML document from JavaScript but also can be used to manipulate XML documents you create on the fly or load via a URI. I'll provide you with a quick and dirty example below.

If you want to jump right into DOM programming in a VoiceXML environment, here are a couple of voice browser implementations that support the DOM via the <data> tag as extensions to their VoiceXML 2.0 implementations:

Tellme Networks (http://studio.tellme.com/dom/howto/using_data.html)
BeVocal (http://cafe.bevocal.com/docs/vxml/data.html)

You can sign up for a developer account on Tellme Studio or BeVocal Cafe to test your voice applications. All you'll need is a Web server with a public interface on which to host your VoiceXML content and XML data and access to a telephone.

Using MSXML from WSH
If your desktop machine is running a reasonably recent version of Microsoft Windows, you probably already have MSXML on your machine. If not, download and install it from the link above.

Copy and paste the following code into a text file and call it msft_dom.js:


var oDOM = BootStrap(WScript.Arguments(0));
if (!oDOM) {
  WScript.Quit();
}

// now that we have a DOM, let's start coding...

// get the root document element
var oRoot = oDOM.documentElement;
Log("root nodename=" + oRoot.nodeName);
Log("version=" + oRoot.getAttribute("version"));
try {
  Log(GetChannelTitle(oDOM));
}
catch(e) {
  Log(e.description);
}

function GetChannelTitle(dom) {
  // retrieve "/rss/channel/title" (see XPath spec for this notation)
  var sTitle = "";
  var oRoot = dom.documentElement;
  if (oRoot.nodeName == "rss") {
    var oChan = GetFirstChildNamed(oRoot, "channel");
    if (oChan != null) {
      var oTitle = GetFirstChildNamed(oChan, "title");
      if (oTitle != null) {
        sTitle = oTitle.firstChild.data;
      }
    }
  }
  return sTitle;
}

// do a shallow traversal of oParent to find the first node named sName
function GetFirstChildNamed(oParent, sName) {
  var oNode = null;
  if (oParent != null) {
    for (var i = 0; i < oParent.childNodes.length; i++) {
      var oChild = oParent.childNodes.item(i);
      if (oChild.nodeName == sName) {
        oNode = oChild;
        break;
      }
    }
  }
  return oNode;
}

// create a DOM instance, load the specified URI, 
// and return the DOM if successful
function BootStrap(uri) {
  var oDOM = WScript.CreateObject("MSXML.DOMDocument");
  oDOM.async = false; // let's keep it simple, shall we
  oDOM.validateOnParse = false; // disable validation for speed
  oDOM.load(uri);
  if (oDOM.parseError.errorCode != 0) {
    Log(oDOM.parseError.line + ", " + oDOM.parseError.reason);
    return null;
  }
  else { 
    return oDOM;
  }
}

function Log(s) {
  WScript.Echo(s);
}

The BootStrap function encapsulates the Microsoft COM-based approach to creating an XML parser, using it to load an XML document from a URI, and returning a reference to the DOM exposed by the parser. The URI passed to the function is taken from the command-line. WSH implements the WScript object and exposes the command-line arguments you pass in via the Arguments collection

Once the XML document is loaded, the rest of the code is generic DOM manipulation. The code above shows you how to do the following:

1) Retrieve a DOM node representing the root document element
2) Retrieve the name of a DOM node
3) Retrieve the value of an attribute
4) Traverse the child elements of a DOM node
5) Retrieve the text contained within a DOM node

Here's how to invoke the code:

cscript msft_dom.js http://news.com.com/2547-1_3-0-5.xml

Here's the expected output:

root nodename=rss
version=2.0
CNET News.com

When the script executes, it fetches a public RSS (Really Simple Syndication) feed from the news.com site which contains the five most recent news stories. Lots of other XML data sources are freely available on the Web, and you can certainly create your own on your own hard disk or Web server.

To learn more about the WSH environment, see http://msdn.microsoft.com/library/en-us/script56/html/wsoriWindowsScriptHost.asp. To learn the particulars of Microsoft's implementation of the DOM, download the MSXML SDK.

Using Mozilla
The Mozilla Browser supports numerous ways to parse an XML document and expose it via the DOM. The following HTML document demonstrates three of these mechanisms:


<html>
<head>
<title>XML Test</title>
<style>
  body {font-size: 9pt; font-family: verdana;}
</style>
<script>
var doc = null;

// xml http error callback
function handle_error() {
  alert("error");
}

// return the uri in the textbox for parsing
function GetURI() {
  var uri = "";
  try {
    uri = document.getElementById("txt1").value;
  }
  catch(e) {
    Log("Couldn't get URI to load");
  }
  return uri;
}

// return the content contained in the textarea for parsing
function GetAreaContent() {
  return document.getElementById("area1").value;
}

function test_load() {
  doc = document.implementation.createDocument("","",null);
  doc.async = false;
  try {
    if (doc.load(GetURI())) {  
      UseDOM(doc);
    }
    else {
      Log("Unable to load " + GetURI());
    }
  }
  catch(e) {
    Log("Unable to load " + GetURI());
  }
}

function test_xmlhttp() {
  var oReq = new XMLHttpRequest();
  oReq.onerror = handle_error;
  try {
    oReq.open("GET", GetURI(), false, "", "");
    oReq.send("");
    doc = oReq.responseXML
    UseDOM(doc);
  }
  catch(e) {
    Log("Unable to load " + GetURI());
  }
}

function test_domparser() {
  var parser = new DOMParser();
  doc = parser.parseFromString(GetAreaContent(), "text/xml");
  UseDOM(doc);
}

var xmp = null;
function init() {
  xmp = document.getElementById("xmp1");
}

// customize this function to play with the DOM
function UseDOM(dom) {
  if (dom != null) {
    Log(dom.documentElement.nodeName);
  }
  else {
    Log("Don't have a DOM");
  }
}

function Log(s) {
  if (xmp) {
    xmp.innerHTML += "<br />" + s;
  }
}
</script>
</head>
<body onload="init()">
URI: <input id="txt1" type="text" value="file://c:/junk/fruit.xml"/>
<br />
<button onclick="test_load()">createDocument</button>
<button onclick="test_xmlhttp()">XMLHTTP</button>
<br/>
<textarea id="area1" rows="20" cols="50">
<items><item>banana</item></items>
</textarea>
<br/>
<button onclick="test_domparser()">DOMParser</button>

<fieldset>
<legend>Log</legend>
<div id="xmp1"></div>
</fieldset>
</body>
</html>

The test_load function creates a new DOM and then uses the proprietary load method to fetch an existing XML document from a URI. You specify the URI in the textbox (txt1). The test_xmlhttp function uses the XMLHttpRequest object to make an HTTP request for the same URI used by the test_load function described above. Note that, due to cross-domain security restrictions, the external XML document must reside in the same domain as the HTML page. If you load the HTML page from your local hard disk, the XML documents you can load are limited to URLs accessed via the "file" protocol.

The test_domparser function uses the DOMParser object to load an XML document from a string. The string extracted from the textarea (area1).

You can learn more about the DOMParser and createDocument interfaces at the following URL:
http://www.xulplanet.com/tutorials/mozsdk/xmlparse.php
You can learn more about the XMLHttpRequest interface at
http://www.xulplanet.com/references/elemref/ref_XMLHttpRequest.html.

Using a VoiceXML interpreter that supports <data>
If you have access to a Web server with a public interface, you can use Tellme and BeVocal's implementation of the DOM by creating an XML document such as the following:

                                          
<?xml version="1.0"?>
<?access-control allow="*"?>
<list>
  <item>apples</item>
  <item>oranges</item>
  <item>bananas</item>
</list>

Publish it to your Web server as "fruit.xml". Next, author a VoiceXML document including a tag that references the XML document:


<vxml version="2.0"
  xmlns="http://www.w3.org/2001/vxml">
  <catch event="">
    <log>catch-all caught 
     <value expr="_event"/>
    </log>
  </catch>

  <form>
    <block>
      <data name="dom1" src="fruit.xml"/>
      <!-- now that the DOM is loaded, exercise it -->
      <prompt>
        <value expr="dom1.documentElement.nodeName"/>
      </prompt>
    </block>
  </form>
</vxml>

Publish it to your Web server as "fruit.xml".

Next, author a VoiceXML document including a <data> tag that references the XML document:


<vxml version="2.0"
        xmlns="http://www.w3.org/2001/vxml">
        <catch event="">
        <log>catch-all caught 
            <value expr="_event"/>
        </log>
    </catch>


<form>
        <block>
        <data name="dom1" src="fruit.xml"/>
        <!-- now that the DOM is loaded, exercise it -->
        <prompt>
        <value expr="dom1.documentElement.nodeName"/>
                </prompt>
    </block>
</form>
</vxml>

Publish this document to your Web server in the same directory as the XML data document, or adjust the value of the <data> tag's src attribute accordingly. Configure your Tellme Studio or BeVocal Cafe account to point to the URL corresponding to the VoiceXML document, and call the access number provided by Tellme or BeVocal to run your application.

Microsoft, Mozilla.org, Tellme Networks, and BeVocal provide four of the numerous implementations of the DOM. The fidelity of each implementation with the official W3C specification varies.

One of the goals of VoiceXML 2.1 is to standardize the DOM implementation supported by all voice browsers so that you can easily port your voice applications from one VoiceXML platform to another.

Q. RSS (http://blogs.law.harvard.edu/tech/rss) is all the rage, and I'd like to expose RSS feeds via a voice application. RSS is expressed in XML, so use of the <data> tag to retrieve an RSS feed seems like a natural fit, but when I attempt to use the data tag to fetch an RSS feed directly, the interpreter throws an "error.noauthorization" event to my application.

A. According to section 5 of the VoiceXML 2.1 Working Draft, an XML document retrieved by the interpreter via the <data> tag must contain an "access-control" processing instruction (PI) indicating the hosts and/or domains that are allowed to access the data. This mechanism is in place to protect data providers from having their data exposed by an interpreter they trust to an untrusted application. I'll go into that in more detail in another column.

In Appendix E of VoiceXML 2.1, the "access-control" PI is described in detail. The last example demonstrates how to indicate to an interpreter that any application retrieved from any host or domain should be allowed to access the data:

<?access-control allow="*"?>

But how do you get that PI into an RSS feed that you don't publish? That's going to require a little server-side magic to proxy requests from your voice application to the desired RSS feed. Fortunately, you only have to write the proxy once, and you'll be able to use it again for any public data feed - not just RSS. Here's a sample CGI implementation in Perl that uses the LWP::UserAgent module:


#!/usr/local/bin/perl -w
use strict;
use LWP::UserAgent;
use CGI qw(param);

sub write_access;
sub Log;

#http://news.com.com/2547-1_3-0-5.xml
my $url = param("url");

# don't compromise your file system
# up to you to support other protocols (e.g. HTTPS)
if (!defined($url) || $url !~ /^http:\/\//) {
  print "Status: 400 Baaad Request\n\n";
  exit;
}

# enable autoflush
my $old = select STDOUT; $| = 1; select $old;

my $ua = new LWP::UserAgent;
$ua->agent('My-RSS-Proxy/0.1');
$ua->{timeout} = 10;
#BUGBUG: If you use a proxy to access the Internet, set this
#$ua->proxy("http", ""); 

my $req = new HTTP::Request("GET", $url);
my $resp = $ua->request($req);

# just forward the HTTP response headers
my $headers = $resp->headers_as_string;
my $crlfs = "";
if ($headers !~ /\n{2}$/) {
Log("Adding CRLFs to headers");
  if ($headers =~ /\n/) {
    $crlfs = "\n";
  }
  else {  
    $crlfs = "\n\n";
  }
}
print "$headers$crlfs";

if ($resp->is_success) {
  print $resp->content;
  write_access;
}
else {
  Log("Badness: " . $resp->status_line);
}

sub write_access {
  print qq{<?access-control allow="*"?>\n};
}

sub Log
{
  my($s) = @_;
  print STDERR "$s\n";
}

If you're not familiar with Perl, here's the basic idea: the CGI takes a single request parameter, "url", which is the URL to the RSS feed or any other publicly available data you want to retrieve. The script performs some basic sanity checking on the value of this parameter for the safety and security of the server that's hosting the CGI, and then use the LWP::UserAgent module to perform a simple HTTP GET request for that URL. If the request is successful, the script prints the HTTP response headers and the content followed by the "access-control" PI. Otherwise, the script just prints the headers which will include the HTTP status code indicating why the request wasn't successful. I leave it as an exercise to the reader to write the equivalent code in your server-side language of choice.

Why put the PI at the end of the XML document?

Processing instructions are discussed in 2.6 of the XML specification (http://www.w3.org/TR/2004/REC-xml-20040204/#sec-pi), and there's nothing in the spec that forbids one from putting the PI at the end of the XML document.
Furthermore, there's nothing in the VoiceXML 2.1 spec that forbids that either. We can't put it at the beginning of the document because 2.8 of the XML specification is explicit about where the XML declaration must occur if it is present.

Violation of this rule will cause most XML parsers to throw an exception or return an error which translates into an error.badfetch thrown by a VoiceXML interpreter. Since we don't control the RSS feeds and whether or not they actually include an XML declaration, it's simply safest and most optimal to stick the PI at the end of the document.

Here's what a request through our proxy from a VoiceXML application might look like. It fetches the Apple iTunes Music Store's RSS feed for the five newest releases.


<vxml version="2.1"

xmlns="http://www.w3.org/2001/vxml">
<var name="feed_proxy" expr="'data_proxy.cgi'"/>
<script src="rsshelpers.js"/> 
<form>
        <block>
                <catch event="error">
                   <log>feed fetch or access caused <value expr="_event"/></log>
                   Sorry. The requested information is unavailable. Please try again later.
                </catch>
                <var name="feed"
                expr="'http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa
                /wpa/MRSS/newreleases/limit=5/rss.xml'"/>
                <data name="oFeed" expr="feed_proxy + '?url=' + feed"/>
                <prompt><value expr="GetChannelTitle(oFeed)"/></prompt>
                <exit/>
        </block>
</form>
</vxml>

Here's the content of rsshelpers.js:


function GetChannelTitle(dom) {
  // retrieve "/rss/channel/title" (see XPath spec for this notation)
  var sTitle = "";
  var oRoot = dom.documentElement;
  if (oRoot.nodeName == "rss") {
    var oChan = GetFirstChildNamed(oRoot, "channel");
    if (oChan != null) {
      var oTitle = GetFirstChildNamed(oChan, "title");
      if (oTitle != null) {
        sTitle = oTitle.firstChild.data;
      }
    }
  }
  return sTitle;
}

// do a shallow traversal of oParent to find the first node named sName
function GetFirstChildNamed(oParent, sName) {
  var oNode = null;
  if (oParent != null) {
    for (var i = 0; i < oParent.childNodes.length; i++) {
      var oChild = oParent.childNodes.item(i);
      if (oChild.nodeName == sName) {
        oNode = oChild;
        break;
      }
    }
  }
  return oNode;
}

Although the DOM Document and Element objects expose a getElementsByTagName method,
the GetFirstChildNamed function is more efficient since it only does a shallow traversal of the nodes in the DOM tree. It's sufficient given the structure of the XML document and the elements we're trying to retrieve. The GetChannelTitle function leverages GetFirstChildNamed to dig the RSS channel title out of the DOM corresponding to the RSS feed.

back to the top