Volume 1, Issue 5 - May 2001
   
 

City CarShare Reservation System: A VoiceXML Case Study

By Rachel McConnell and Bryan Michael

City CarShare is a San Francisco nonprofit company whose mission is to reduce the number of cars in the city through shared ownership. Members can reserve a car through the website or by calling the office. Because of their limited resources and the need to service their members around the clock, a VoiceXML application is the perfect solution for them. Using BeVocal Café, indigo egg recently completed this reservation system for City CarShare.

VoiceXML applications offer instant access to information and automated services from any telephone. Before VoiceXML, some of these types of services have been available using costly human operated call centers or cumbersome DTMF touch tone menu trees implemented on proprietary IVR platforms. VoiceXML reduces many of these problems and provides the added convenience of using voice to complete the task at hand. Many of the VoiceXML applications being developed on BeVocal's platform (http://cafe.bevocal.com) fall into one of these general classes:

  • Targeted applications most useful when traveling or away from the office: sales force automation and customer relationship management applications.

  • Cost reduction and improved customer service: making reservations and checking availability without waiting on hold -- also getting more information like driving directions to business locations.

  • Employee productivity improvements achieved by streamlining work flow and task completion: expense approvals and quality assurance applications.

City CarShare falls squarely in to the second category. Their executives have great expertise in their own domain space but, as with most companies, have less of an understanding of what a voice application can do. Indigo egg's first task with City CarShare was to give them a realistic idea of VoiceXML's capabilities. Together we decided upon an application that would accept, list, and cancel reservations, and, for callers who are not members, play an informative message about City CarShare. In this article we will explore some of the usability and technical issues that arose during the development of this application.

Usability

Speech is the most natural form of communication, and people have high but often unconscious expectations associated with the medium. All successful application deployments therefore demand a high degree of usability. The two main factors that affect the usability of VoiceXML applications are dialog design and speech recognition accuracy - making sure the application listens for utterances that people will actually say. Both of these concerns must be addressed in order to provide an application that people will want to use.

The first decision to be made is the characterization of the application. Talking on the phone is carrying on a conversation, and people have many subconscious expectations about a conversation's form and content. Creating an actual personality for the application allows callers to anthropomorphize it, and this gives several benefits. First, the application is sticky: the caller feels friendly towards the system and is more likely to call again. Second, the caller is more forgiving of recognition 'mistakes'. Perhaps most importantly, it creates a greater level of trust in the caller and a correspondingly greater desire to cooperate with the application. For City CarShare, indigo egg chose a friendly, informal college student character, to blend with their 'green' aesthetic.

The next usability hurdle arose in porting their authentication procedure to voice. The passwords that City CarShare had been issuing to their members had been alphanumeric rather than solely numeric. This allows passwords of the sort 'PAss+w0rd' where upper- and lower-case letters are mixed, and numerals and punctuation are contained within the password. Even if this could be pronounced, the spelling cannot be understood by the recognition engine. To solve this problem, indigo egg recommended that City CarShare issue four-digit PINs instead of their passwords. After we educated them on the problems inherent in translating written text to spoken language, City CarShare agreed. This decision was a difficult one, as it impacted all City CarShare's members, and it illustrates the kind of tradeoff that must often be made. (A full discussion of alphanumeric passwords in ASR requires an article of its own.)

Dialog Design

Conversations contain subtle but important cues, called discourse markers, which convey meta-level information about the conversation. These are usually small words or choices of phrasing that may seem meaningless when written but play an important part in keeping two speakers in sync. A voice application dialog should use discourse markers in appropriate places to keep the caller grounded in the application and increase comfort and usability. For example, it is easy to write a dialog such as the following:

Application: What location would you like to pick up the car from?
Caller: Downtown.
Application: What day would you like to pick up the car?
Caller: Next Thursday.
Application: What time would you like to pick up the car?
Caller: Noon.

This dialog is unnatural and discontinuous. Indigo egg's dialogs use pronouns extensively to refer to the car under discussion, to create and reinforce continuity. When it has recognized the response, the application briefly repeats the caller's answer, with a confirmatory discourse marker such as 'okay', 'got it', or 'mm-hmm' so that the caller knows the application has heard them These discourse markers are chosen randomly for a more natural-sounding response. Compare the following:

Application: What location do you want to pick up the car from?
Caller: Downtown.
Application: Downtown, got it. And what day do you need it?
Caller: Next Thursday.
Application: Okay, Thursday, March 12th. At what time?
Caller: Noon.

Because this conversation works with the caller's subconscious expectations rather than ignoring them, it is much more comfortable to engage in. The first prompt deserves further discussion. It would be more natural to say, "Where do you want to pick up the car from?" but this is not the best choice in a voice application. The word 'where' leaves too much latitude in the caller's answer, inviting responses that may not be included in the grammar. The need for a more directed request for the location makes discourse markers all the more important in striking a balance between natural and directed prompts.

In general, some important considerations in dialog design include:

  • Create a consistent character that engages the listener in an appropriate style for a particular application.

  • Dialogs describe conversations that are very different from written text. Writing is generally more formal and impersonal than speech.
  • Use discourse markers in prompts to help ground the caller; work with their subconscious expectations of how conversations happen rather than against them.

back to the top

 

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).