Volume 2, Issue 1 - January 2002
   
 

Some Thoughts on Speech Grammar

In this monthly column, an industry expert will answer common questions about VoiceXML and related technologies. Readers are encouraged to submit questions about VoiceXML, including development, voice-user interface design, and speech technology in general, or how VoiceXML is being used commercially in the marketplace. If you have a question about VoiceXML, e-mail it to and be sure to read future issues of VoiceXML Review for the answer.

Q: I'm writing a VoiceXML application that I want to be as portable as possible. I realize W3C speech grammar markup will be the ultimate choice eventually, but it seems that a lot of platforms don't support it yet, or if they do only partially. Any thoughts as to what a good interim strategy is wrt grammars? Are there other areas in VoiceXML additional to grammars where there are portability issues I should be aware of?

A: Great question. In working with VoiceXML (and any healthy standard), you will always face the choice of how you want to prioritize cross-platform compatibility vs. the newest/most advanced features. As any healthy standard evolves, two things happen.

First, leading vendors continually innovate new features in response to client demand--- at first, these are of course "proprietary" extensions to the standard. Vendors who are committed to the standards process always take these innovations and actively evangelize them to the public standards process with the hope of getting them folded in over time. Along the way, many other vendors may even adopt these extensions as a "de facto" standard before final "true" standardization is completed. Exact syntax and features may shift some, of course, during the standardization process, and the original vendors are then on the hook to update their platforms to be compliant with the final version.

Secondly, platform vendors can't all turn on a dime, and some will always lag somewhat in adopting complete implementations of the standard -- especially as it grows and evolves over time (W3C speech grammar markup is a great example of this). Once again, leading vendors will stay fairly in sync in a reasonable time frame--- that's what makes them leading vendors.

Given this healthy and innovative (but imperfect) environment, you always have the choice of which features to take advantage of when building applications. What's the right strategy? The answer is, like it or not, "it depends". You need to take a look at the following things:

1) Examine all platform vendors you're interested in, and see which features (including grammar formats) they currently support, and talk to them to find out their existing track record and ongoing philosophy regarding keeping in step with the standard as it evolves.

2) Think hard about where your priorities are -- do you really intend to deploy on multiple platforms? How many different ones do you *really* care about?

3) Specifically for grammar formats, do the vendors you care about support it sufficiently today to get *your* applications done? Even though there are features that may not be supported, what matters more is if the features you really need are supported.

4) If there are features that you want to use that aren't quite perfectly cross-platform compatible today, what will it really cost you in development time to make the necessary changes should you choose to switch?

Remember, millions of people made the decision to write slightly different versions of their Web sites for IE vs. Navigator to optimize perforance on both. In my opinion, VoiceXML is already far superior (e.g. less inconsistencies across implementatinos) to HTML in this regard---- but you have to make your own decision specific to your business objectives and needs.

Q: Given the current state of speech recognition technology, When writing speech grammars for VoiceXML apps, is it best to write small compact grammars with a very narrow set of possible utterances or is it better to write larger wide open grammars?

A: It's most important to write your grammars to closely match what your callers are actually saying -- having too much coverage (too many phrases in the grammar, especially ones that are confusable with one another) is equally as bad as having too little (having many things missing that your callers reguarly say). Optimizing this balance through a combination of great grammar design, and great UI design that carefully guides callers to "say the right things" without frustrating them, is the fine art that is voice appliation design.

back to the top

Copyright © 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).