Volume 3, Issue 2 - March/April 2003
   
 

Elvira - a VoiceXML Platform for Research

By Pavel Cenek

Continued from page 1...

This feature is absolutely crucial for researchers. It allows them to freely extend practically any aspect of VoiceXML by the invocation of a custom function and perform easily virtually any operation. The integration of external functions into the system is absolutely seamless and the functions can be easily reused. Moreover, external functions can be also called from VoiceXML tag.

External functions can perform any task beyond the scope of VoiceXML. They are often used for connecting to a database and retrieving data that are then accessible from ECMAScript within VoiceXML. However, the spectrum of their use is much broader. We will mention some possibilities in the following section.

Research Scenarios

This section describes some tasks handled by researchers in the field of dialogue systems and presents Elvira as a base tool for their solution.

Statistical Processing of Dialogues

Statistical methods play a key role in human language technologies. Huge collections of statistical data extracted from dialogues are analyzed and models describing various dialogue properties are deduced from them.

Elvira is an excellent tool for collecting such statistical data. Logging can be done in three different ways:

  1. Using the VoiceXML tag - this is the standard way allowing one to log e.g. the dialogue flow
  2. Logging within an external function - can be used e.g. for logging database queries
  3. Logging within an component - this is typically used by input and output components for fine grained logging of speech synthesis and recognition related events.

Research in the Field of Dialogue Strategies

Dialogue strategy determines next step of dialogue for every dialogue state. Dialogue strategy of VoiceXML is described by the form interpretation algorithm (FIA). However, FIA is not always strong enough or suitable for all dialogue models. Let us name at least two such situations:

  1. FIA does not support repeating prompts when the user asks "what did you say". (If a generating event is used for catching the phrase, the prompt counters are increased and a different prompt can be said)
  2. There is no way FIA can determine where the user interrupted spoken prompt. This information is needed for implementation of intelligent tapered prompts.

Our VoiceXML platform makes it possible to replace FIA by an external decision mechanism. The idea is quite simple. Each VoiceXML form item has its cond attribute that is an ECMAScript expression and hence an external function can be called within the condition. The function can simply enable or disable the items as needed. The information which is not accessible in VoiceXML can be used for the decision. It is actually not a real replacement of FIA, it is rather its restriction to only one possibility.

Wizard of Oz

Wizard of Oz (WOZ) is a technique used for dialogue design. It helps to find out how people are likely to interact with a system before the system is finished or even before its design began. When using this technique, user
interacts with what appears to be an computer system but is in fact a simulation provided by either a human (called wizard) or the combination of a human and a computer.

An environment for WOZ simulations was built upon Elvira VoiceXML platform. It demonstrates capabilities of the platform very well.

As mentioned above, WOZ simulation requires a wizard who should be able to inspect every step of the dialogue and influence subsequent dialogue flow if needed. A web-based user interface was created for the wizard in this case. The interface is depicted in the following figure.


User interface of Wizard of Oz application. The wizard can see all values already specified by user for current frame and values specified in the last dialogue step. The wizard can change the values as needed and classify the last speech act. Next dialogue step is performed accordingly.

A technique similar to the external decision mechanism described above is used in this WOZ system. After each dialogue step, an external function logs all important information about the dialogue step into a database and waits until a response is stored into the database. A php script waits for the information stored by the external function into the database and regenerates the web page for the wizard. When the wizard submits the form, the script stores wizard's corrections to the database. The external function returns information about the corrections back to VoiceXML and the dialogue is modified accordingly.

The data in database are analyzed after the experiment and an improved version of dialogue is created. The power of VoiceXML excels here - modifications of the dialogue can usually be done easily and quickly.

Continued...

back to the top

 

Copyright © 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).