Volume 5, Issue 2 - March/April 2005
 
   
 

Opera 8 Ships One Million Browsers with X+V Multimodal Technology

By Igor Jablokov

Opera Software ASA (http://www.opera.com) recently announced that version 8.0 of its browser received over one million downloads within four days of release. The Norwegian software vendor has created a fast and standards compliant Web experience. While this news is certainly commendable for any product introduction, rivaling even Mozilla’s Firefox, it is also a milestone for the multimodal and voice standards community. Opera has included a feature that could usher in an age of human-computer interaction predicted long ago by many a science fiction writer.

The Windows version of this browser now has an option that enables voice interaction. This functionality is provided by the IBM® Multimodal Runtime Environment, which connects the Opera Browser to IBM Embedded ViaVoice® (the same technology currently shipping in certain auto navigation systems). Not only does this enable users to interact with the entire browser interface using their voices (e.g. users can say “browser go home” or “browser fullscreen”), but they can also execute applications written in the XHTML+Voice (X+V for short) markup language. The X+V language permits developers to write and deploy multimodal Web applications, which allow users to interact through sight, sound and speech. This language was co-authored by IBM, Motorola and Opera and is under consideration by the W3C standards body.

While modern day VoiceXML applications require specialized skills, X+V applications are different in that they more closely resemble standard Web applications. This breaks the current speech development paradigm and can allow the large body of Web developers to simply add voice interaction to existing Web applications. For instance, one of IBM’s customers augmented an existing enterprise-focused application and moved it into production within a month. This was without prior experience developing X+V or speech in general.

Imagine the potential use cases for this type of interaction. In any environment where “hands-free” is not just a buzzword but a necessity, such as in healthcare, warehousing or enterprise applications, the value of this system becomes obvious. Doctors can ask for patient status by name or get alerted to changes in medical conditions using the natural sounding voice output (CTTS) that is included with the browser. In warehouses, companies can increase worker productivity by having the system communicate new orders to employees and leaving their hands free to fulfill the order. Also consider insurance adjusters speaking into complex forms and recording accident information while focused on investigating a scene.

But multimodal is not just for workplace activities. With the explosion of media content available to consumers, this interface is a natural fit with entertainment devices, such as set top boxes and digital video recorders. Instead of hunting through several menus, users could find a song by simply speaking the title into their remotes and playing it instantly. Or think of how often you use the browser on your mobile phone; would you use it more if you could simply speak into it, asking it for the latest news or weather? Service providers could then use this same framework to generate additional revenues by reducing the distance between their customers and licensed content, such as ring tones, or selling value added services such as multimodal email and calendaring.

Opera 8 for Windows offers users a gateway into the multimodal experience. IBM looks forward to developers’ creativity in leveraging these standards-based technologies to augment existing Web applications for increased end user productivity.

More information on the X+V specification is available here: http://www.voicexml.org/specs/multimodal/x+v/12/

More information on IBM’s multimodal implementations and toolkits is available here: http://www.ibm.com/pvc/multimodal

Igor Jablokov is Program Director of IBM Software Group’s Multimodal & Voice Portal Initiatives and currently serves as a VoiceXML Forum Director. He can be reached at .

IBM and ViaVoice are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries or both.

Windows is a trademark of Microsoft Corporation in the United States, other countries or both.

Other company, product and service names may be trademarks or service marks of others.

Actual results may vary from any performance data contained herein. Users of this document should verify the applicable data for their specific environment.



  back to the top

Copyright © 2001-2005 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).