Volume 3, Issue 4 - July/August 2003
 
   
  continued from page 2...

The results indicate that Java technology has made major advances since 1999.  The old Microsoft 1.1 JVM did not implement threading efficiently, and so it does not scale at all well in this test as the number of concurrent channels increases.  But the recent Sun and IBM JVMs run efficiently and scale very well due to advances in garbage collection, threading, and just-in-time compilation.  In fact, the IBM 1.3 JVM handles one dialog every three seconds on each of 200 channels, five or six times the speed required by the temperature conversion dialog.   This suggests that the desktop could handle over 1,000 channels of this artificial test.  So Java performance is probably not far at all from C++ performance.  We can also conclude that VoiceXML interpretation by itself can be very efficient.

Therefore, we endorse the use of Java for writing voice browsers.  If you already use a Java-based voice browser, do experiment with different JVMs to see which work best for your system.  Some other data using our browser suggests that the Sun JVM is equally as fast as the IBM JVM.  Other tests indicate that the BEA JRockit JVM is significantly faster than IBM, but sadly has stability issues.  You might also experiment with the IBM Jikes Java compiler and see if it helps.

Methodology

Our team used an agile methodology, and when Kent Beck's Extreme Programming Explained came out in September 1999, we quickly agreed with most of his ideas.  The VoxGateway was a good candidate for an agile methodology: the area of voice browsing was new and innovative; we had a small, highly experienced team; and the requirements changed rapidly as VoiceXML 1.0 took form.  So it made good sense to be agile.

We built the system in small increments of functionality, always tried to have a working system, and tested every change before committing it to the project source control system.  We continually refactored our code.  Although we intellectually assented to pair programming, we never had the nerve to shift to it.  Refactoring was essential to deal with the huge changes between VoiceXML 0.9 and 1.0, while testing each change before inclusion into the project source tree gave us the courage to refactor.  I would recommend an agile approach to anyone developing their own VoiceXML system.

Architecture

The key architectural decision was to use the Factory pattern (see Design Patterns by Gamma, Helm, Johnson, and Vlissides, Addison-Wesley, 1995).  In this pattern a dimension of variability is identified and then captured in an abstract superclass.  The concrete subclasses of this class then represent variations on this dimension.  A Factory object is the only place where the subclasses are referenced: the rest of the system only sees the abstraction.  For example, in our system we need abstract "URLFetcher" objects to go off and get web pages.  Our system shouldn't care what particular URLFetcher is used.  So in our Factory object we have a method called newURLFetcher() which returns a URLFetcher whose actual subclass can be a plain vanilla JavaURLFetcher, a Win32URLFetcher that uses the Microsoft Wininet DLL used by Internet Explorer, or a JigsawURLFetcher that uses the W3C Jigsaw client to fetch web pages.

The Factory is itself an abstract superclass, so that different subclasses of Factory can define different configurations of the VoxGateway.  For instance, the FlexibleFactory is a very generic subclass that determines the configuration settings from a properties file, for instance.  One property is the name of the URLFetcher class to use.

The Factory pattern is exploited repeatedly.  To protect our code from knowing which particular XML parser is being used, the Factory's newXMLParser() method returns an object of type XMLParser.  This allowed us to shift from the IBM XML Parser for Java to the Xerces XML Parser with very little effort.  Likewise, to protect our code from knowing which ECMAScript interpreter is being used (currently Rhino), the Factory's newECMAScript() method returns an ECMAScript object.

Another dimension of variability is the particular language a voice markup page is written in.  We handle this by defining a "markup language compiler" object (an MLCompiler), and then subclasses of this for each language we support (VoxMLCompiler for VoxML and VoiceXMLCompiler for VoiceXML 1.0 and 2.0).  The Factory's newMLCompiler() method looks at a byte array returned by the fetching subsystem and replies an MLCompiler based on its content.

Various operations, administration, and maintenance (OA&M) subsystems are plugged in the same way: logging, billing, user definition, etc.  The OutputFilter is an abstract superclass that defines a hierarchy of classes that filter TTS prompts based on the speech synthesis markup language supported by the text to speech system.

But the major decision made by the Factory is which speech and telephony resources to use.  The speech and telephony API is called the ExecutionContext, and it has very abstract methods like speak() for prompting, setGrammar() for conveying speech recognition grammars, record() for recording, transfer() for transferring calls, and listen() for doing a speech recognition.  The Factory newExecutionContext() method can return different subclasses of ExecutionContext, including one that implements a text only interface for batch regression testing, a JSCExecutionContext for integrating with the Nuance Java Speech Channel, a MIXExecutionContext to fit it into the MIX Vlet environment, and so forth.  Licensees have implemented various other ExecutionContexts based on their needs.

This architecture has given rise to the terms core interpreter and framework.  The core interpreter (or just "core") consists of all the classes that are used in every configuration of the VoxGateway, whereas the framework consists of those classes that can be used or not used based on the configuration specified by the Factory in use.


Figure 2: The VoxGateway's dialog processing cycle.

Continued...

back to the top

 

Copyright © 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).