VoiceXML Review - Feature Articles

Volume 3, Issue 2 - March/April 2003

Elvira - a VoiceXML Platform for Research

By Pavel Cenek

Continued from page 2...

Multimodal Interfaces

Although VoiceXML was not designed for multimodal interaction, Elvira can be advantageously exploited for testing applications with multimodal interfaces, though with some limitations.

VoiceXML has at least two conceptual limitations for building multimodal applications. It defines no means for synchronization of modalities and it defines output in the form of prompts instead of a semantic form. Both
problems can be more or less overcome with Elvira. Currently, all sources of input have to be controlled by one input component and the input component is responsible for their synchronization. Due to the lack of a semantic
representation of the output, use of devices such as graphical displays can be more complicated. However, it is possible to utilize external functions for full control of such devices.

Elvira's eligibility for dealing with multimodal interfaces can be presented on two projects. In both of them, Elvira is used as dialogue manager responsible for controlling the interaction with the user and processing the collected information.

The first project, named Inspire (http://www.inspire-project.org/), is a "project contributing to the creation of smart home environments. Its main objective is the integration of a multilingual, interactive, natural, speech dialogue-based assistant for wireless command and control of home appliances."

The other one, called Multimodal Dialogue Management
(http://issco-www.unige.ch/projects/im2/mdm/), "conducts research on multimodal human computer interaction, with a view towards the application of results to enhance the user-friendliness of the information society."

Studying the web pages of the projects can furnish reader with more detailed information. We will present only two screenshots from the projects here.

A sample from the Inspire project. User can control appliances in a room by voice dialogue.

A sample from the Multimodal Dialogue Management project. User can get information about people shown on the picture. A person the user wants to focus on can be specified by Voice or by mouse click into the marked area. Anaphoric phrases can be handled (i.e. "give me the request from him" affects the person user talked earlier about).

Conclusion

Elvira currently supports most of the features required by the VoiceXML 2.0 specification and can be already very well used for many applications. Besides of the usage for research, Elvira can also serve as a core for speech interfces of desktop applications or for building voice browsers. An interested reader can consult Elvira Usage Scenarios (http://gin2.itek.norut.no/elvira/_elvira.php?p=scenarios) for other ideas.

The work on Elvira continues and we plan to fully support the VoiceXML 2.0 specification in the future. The work on other languages from the W3C Speech Interface Framework is in progress as well. New versions will bring also features that make the platform even more flexible.

The utility of the platform also depends on the availability of various components. The standard distribution of Elvira already offers input, output and grammar components fulfilling needs of most users. The distribution also
contains a simple test application so that Elvira can be used for interpretation of VoiceXML documents immediately without any programming. Source code of some components is also available and can serve as a base for development of other components..

We expect that the repository of available components for Elvira will grow in the future and that a repository of 3rd party components will be created. We already have first contributions.

Acknowledgement

Author is grateful to David Portabella from the Artificial Intelligence Laboratory at the Swiss Federal Institute of Technology in Lausanne (http://liawww.epfl.ch/) and to Miroslav Melichar form the Laboratory of Speech and Dialogue (http://www.fi.muni.cz/lsd/) at the Faculty of Informatics, Masaryk University, Brno, Czech Republic for information about the projects mentioned in this article and for screenshots.

back to the top

Copyright © 2001-2003 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).