VoiceXML Review - Columns - Speak & Listen

Volume 2, Issue 1 - January 2002

Some Thoughts on Speech Grammar

In this monthly column, an industry expert will answer common questions about VoiceXML and related technologies. Readers are encouraged to submit questions about VoiceXML, including development, voice-user interface design, and speech technology in general, or how VoiceXML is being used commercially in the marketplace. If you have a question about VoiceXML, e-mail it to and be sure to read future issues of VoiceXML Review for the answer.

Q: How do I use data from a Microsoft Access database in a VoiceXML application?

A: To use data from any DBMS in a VoiceXML application, you'll need to extract the data and format it in a syntax and place it in a location that a VoiceXML interpreter can fetch, parse, and execute. You have a number of options including the following:

Periodically export the data from the DBMS into VoiceXML or JavaScript or some intermediary format that can be further transformed, and place the exported data on a Web server accessible to the VoiceXML interpreter.
Use an API to extract and format the data directly from the DBMS on demand. The VoiceXML interpreter makes an HTTP request to a server-side script
that in turn fetches the data and formats it for use in your voice application.

When deciding between these two options, consider the following:

How frequently does the data used by your voice application change?
Is it vital that users of your voice application have access to the most up to date information?
Does your DBMS scale to handle the additional demand from users of your voice application?
Are you prepared to secure the data in your DBMS from hackers?

If your answers to these questions are "Infrequently", "No", "No", and "No", then the first option is probably good enough.
For the purposes of this article, I'll assume that's the case. In a future column, I'll tackle the second option.

Some sample data

To help put the solution into perspective, let's define two schemas. The first describes a simple employee table;

Field Name	Type	Description
emp_id	AutoNumber	The employee's unique id (primary key)
ssn	Text (9)	Social Security Number (also unique)
fname	Text (50)	first name
lname	Text (50)	last name
phone	Text (15)	telephone number
dept_id	Integer	foreign key into the department table

4) If there are features that you want to use that aren't quite perfectly cross-platform compatible today, what will it really cost you in development time to make the necessary changes should you choose to switch?

Remember, millions of people made the decision to write slightly different versions of their Web sites for IE vs. Navigator to optimize perforance on both. In my opinion, VoiceXML is already far superior (e.g. less inconsistencies across implementatinos) to HTML in this regard---- but you have to make your own decision specific to your business objectives and needs.

Q: Given the current state of speech recognition technology, When writing speech grammars for VoiceXML apps, is it best to write small compact grammars with a very narrow set of possible utterances or is it better to write larger wide open grammars?

A: It's most important to write your grammars to closely match what your callers are actually saying -- having too much coverage (too many phrases in the grammar, especially ones that are confusable with one another) is equally as bad as having too little (having many things missing that your callers reguarly say). Optimizing this balance through a combination of great grammar design, and great UI design that carefully guides callers to "say the right things" without frustrating them, is the fine art that is voice appliation design.

back to the top