VoiceXML Review - Feature Articles

Volume 3, Issue 2 - March/April 2003

OpenVXI: Fostering VoiceXML via Open Source

By Brian Eberman

Introduction

SpeechWorks has provided an open source VoiceXML interpreter since April of 2001 to reduce barriers for developers considering building VoiceXML solutions. SpeechWorks partnered with Carnegie Mellon University (CMU) to make OpenVXI software available from the CMU site (http://fife.speech.cs.cmu.edu/openvxi/index.html), as well as to host and archive a mailing list. Since its inception, almost 1,500 individual downloads of OpenVXI have occurred. The mailing list has also seen significant activity with the open source community participating to assist with technical issues or provide comments and suggestions for enhancements.

OpenVXI was designed at the outset with an intent to be vendor agnostic and technology independent. This required a design with clear functional boundaries so that each component could be replaced or implemented to different ASR or TTS technologies, or even to support a broader range of speech applications such as multi-modal implementations. Results to date include the adoption of OpenVXI as the VoiceXML interpreter within several major IVR platforms, multi-modal platforms, and research systems. OpenVXI has also been implemented with a wide range of ASR, telephony, and management system.

1. DESIGN

Figure 1: OpenVXI Abstract Distributed System Model

Figure 1 shows an abstract distributed reference model for a VoiceXML gateway and component servers. SpeechWorks used this reference model in designing OpenVXI. There are several important points to this architecture.

First, OpenVXI is only one component of an overall platform. OpenVXI is designed to exclusively provide VoiceXML interpretation. Integrators then incorporate the OpenVXI with other components to build a VoiceXML gateway. In this abstract model, the other components of the platform architecture are:

A telephony services layer which terminates the call, including signaling, from a switch or the PSTN;
A call control agent that manages the call, via the mediating telephony services layer;
The ASR and TTS resources that may or may not be distributed into different processes or network servers;
The platform integration that mediates between the OpenVXI and the ASR and TTS resources;
The application server that consists of one or more web servers or application servers, and contains the application logic, backend connectivity, grammars, and prompts.

Components separation means that the call control operates independently from the OpenVXI interpreter. When a call is received, it is sent from the telephony services component to the call control layer. The call control layer then determines the treatment of the call. The call control agent can choose to handle the call with VoiceXML, in which case it brings the OpenVXI into the call. The call control agent can also force the termination of a call by directly communicating with the platform integration or telephony services component. Lastly, the OpenVXI interpreter main execution function can be invoked as a subroutine call by the call control agent. A number of implementers have decided to invoke the OpenVXI interpreter on a VoiceXML page by VoiceXML page basis. Tighter call control can be achieved with this technique by escaping back to a call handling agent or to interact with a previously defined IVR development environment.

The reference architecture shows that ASR and TTS technologies can receive their audio directly from the underlying telephony services without having it pass through the platform integration code. This is a platform implementation decision that is defined entirely by the developer who is incorporating the OpenVXI. The OpenVXI defines a set of abstract platform interfaces VXIrec, VXIprompt, VXItel, which the developer must implement to the particular speech and telephony technologies they are incorporating for their platform. These are shown in detail in Figure 2.

In order to make these interfaces generic, the OpenVXI assumes that the platform interfaces can be implemented to support the base W3C speech services specifications. Namely, that an implementation of the VXIrec interface can support the W3C Speech Recognition Grammar Specification (SRGS), and that the VXIprompt interface is able to support the W3C Speech Synthesis Markup Language (SSML) specification. The OpenVXI therefore assumes that all recognition, grammar management, prompting (including the TTS markup), and parsing of this information can be entirely delegated to these interfaces. Therefore, and implementation of these interfaces must be able to support HTTP retrieval of information from the application server. This separation of services and the requirement to support HTTP is directly supported by implementations that make use of servers that support the Media Resource Control Protocol (MRCP http://www.ietf.org).

Continued...

back to the top

Copyright © 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).