Volume 7, Issue 1 - April/May 2007
 
   
 

Standards for Multimodal Applications: Recent Activities in the World Wide Web Consortium Multimodal Interaction Working Group

By Deborah Dahl, Conversational Technologies, W3C MMI Working Group Chair

Multimodal interaction allows users to interact with applications by means of a combination of voice, graphics, stylus and other inputs appropriate for the application, the user and the device. Multimodal interaction is especially valuable for interacting with small devices such as cell phones with small and difficult-to-use keypads. The World Wide Web Consortium Multimodal Interaction Working Group is developing standards that provide the basis for interoperable multimodal applications. There are two aspects of this task. First, the working group describes how current standards, such as VoiceXML for voice and XHTML for graphical interaction, can be integrated to create multimodal applications. Second, the group provides new standards that complement current standards as needed. The Multimodal Interaction Working Group is part of the W3C Ubiquitous Web Domain, which focuses on technologies to enable Web access for anyone, anywhere, anytime, using any device. The Ubiquitous Web Domain also includes other working groups whose activities support this goal, such as the Voice Browser Working Group, the Mobile Web Initiative and the Ubiquitous Web Working Group.

One of the MMI Working group’s most important activities is the MMI Architecture. This is an extremely flexible and powerful architecture which can integrate a wide range of modalities, or ways users can provide input, both local and distributed. Modalities might include, for example, speech, GUI, and stylus, all coordinated by an interaction manager. The interaction manager communicates with the modalities entirely through a small set of life-cycle events (there are currently seventeen of these) defined in the MMI Architecture. For example, if voice interaction is part of a multimodal application, it could be implemented with an encapsulated VoiceXML interpreter that communicates with the interaction manager via the life-cycle events. An interaction manager might be implemented by means of tools such as scripting, server programs, or SCXML (State Chart XML, being developed by the Voice Browser Working Group). Figure 1 is an example of the MMI Architecture, showing the interaction manager and several modalities.


Figure 1: Example of the MMI Architecture

Another important activity of the MMI Working group is a standard for representing user inputs, Extensible MultiModal Annotation (EMMA). In the Multimodal Architecture, user inputs represented in EMMA are sent from modalities to the interaction manager, which uses them to determine the next step in the interaction.

A third MMI specification is InkML, which describes an XML-based representation of ink or stylus input. For example, a handwriting or sketch recognition modality could be make use of input represented in InkML.

The group is currently working on examples of how MMI applications can be authored using the MMI Architecture and languages such as VoiceXML and SCXML. The goals of this exercise are to provide examples of standards-based multimodal applications and identify architectural issues in the MMI architecture. Current authoring topics include data synchronization, focus synchronization, accessing and manipulating EMMA results, canonicalizing data representations, handling media streams, and possible MMI extensions to SCXML. The group plans to publish these examples as a W3C Note.

The group has recently been rechartered, and this would be an excellent time for new participants who want to contribute to this activity to join the group. The group also very much welcomes comments on any of the current specs.

For more information:
Multimodal Interaction Working Group Home Page: http://www.w3.org/2002/mmi/
MMI Working Group Charter:  http://www.w3.org/2004/03/mmi-charter.html
MMI Architecture: http://www.w3.org/TR/2006/WD-mmi-arch-20061211/
EMMA: http://www.w3.org/TR/emma/
InkML: http://www.w3.org/TR/2006/WD-InkML-20061023/



  back to the top

Copyright © 2001-2007 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).