VoiceXML Review - Feature Articles

Volume 2, Issue 7 - November/December 2002

Enhancing VoiceXML Application Performance By Caching

By Dave Burke,

Introduction
The VoiceXML architectural model specifies a partitioning of application hosting, and application rendering (figure 1). Specifically, the application is served from a Web Server and is typically created dynamically within the framework of an Application Server or equivalent. The VoiceXML Interpreter renders the resultant VoiceXML document, transmitted across a network by HTTP, into a series of instructions interpreted by the Implementation Platform. Implied in this model is a geographical distribution of the application hosting environment and the VoiceXML platform and thus the incursion of network latencies. An application might make many subsequent requests for new VoiceXML documents during its lifetime and thus these latencies may have considerable adverse effects on performance. In this article we will discuss how caching can be used to enhance the performance of VoiceXML applications. Caching is a strategy for storing temporary 'objects' (e.g. VoiceXML resources) local to the VoiceXML Interpreter that can be employed by the application developer for optimising these latencies. In what follows we will use the phrase 'origin server' to denote the application hosting environment, and 'user agent' to refer to the VoiceXML Interpreter and Implementation Platform.

Figure 1: The VoiceXML architecture model

Why bother with Caching?
Probably the most pertinent reason for caching is to maximise customer satisfaction via improved performance. Since VoiceXML applications are conducting audio dialogues with humans, they should endeavour to respond within the timing boundaries expected by humans. There are also considerable technical advantages to employing a good caching strategy. Load on web servers (and hence corresponding application servers and databases) is reduced thus facilitating savings on scaling costs. Network load is also reduced and since most Internet hosting companies charge for different levels of IP connectivity, it makes financial sense to conserve bandwidth.

HTTP Caching Mechanisms
The purpose of HTTP caching is twofold:
i. to actually avoid the need to make requests to the origin server in many cases, and
ii. to eliminate the need to send full responses in many other cases.
This results in two concepts called expiration and validation, respectively. A local copy of a document that is not expired may be executed without requiring a costly fetch to the server. An expired document that is validated against the server may not require a full re-transmit of the document to the platform. Specifying the expiration times is the responsibility of the application developer and the trick to creating high performance applications.

A VoiceXML platform's caching mechanism is usually similar to that of traditional (visual) browser environments that implement multi-tier strategies. An instructive way of understanding how caching works for a caller of a VoiceXML application is in analogy to a person using a computer in an Internet café: the chosen computer has a local cache that has been used by previous users in the past and may or may not already contain information required by the current user. Since there is a reasonable likelihood that another person, albeit at a different computer, has fetched the same resource before, a network-level proxy cache additionally stores resources for all users. A local cache will give a better response time than a proxy cache, which in turn will yield better performance over requiring the user agent to make requests to the origin server for all resources. Figure 2 illustrates a standard multi-tier cache architecture.

Figure 2: Multi-tier cache architecture

The architecture in figure 2 is easily extended to a hierarchy of caches as the platform scales. Happily, from the perspective of the application developer, the platform's implementation of the caching architecture is largely transparent to the methods for using it, and we discuss these next.

Controlling the HTTP Caching Policy
The caching policy can be controlled by the application developer by specifying attributes in the VoiceXML document [2], [3] and/or by using HTTP header values [1] set on the origin server. Generally it is preferable to use the HTTP headers to control the caching policy but this may not always be possible (for instance if the web server is not under the control of the application developer). The VoiceXML attributes can also be used by the user agent for finer grained control - e.g. forcing a refresh of content or allowing stale content to be used for an extended period of time.

HTTP is a request/response protocol. A header is sent with each request and received with each response. For example, a HTTP request of:

GET /index.html HTTP/1.1
Host: www.voxpilot.com

Listing 1: An example HTTP request

might give a response of:

HTTP/1.1 200 OK
Date: Thu, 08 Aug 2002 09:02:39 GMT
Server: Apache/1.3.26 (Unix)
Cache-Control: max-age=86400
Expires: Fri, 09 Aug 2002 09:02:39 GMT
Last-Modified: Thu, 01 Aug 2002 14:52:43 GMT
ETag: "b7129-5913-3d494b3b"
Content-Length: 22803
Content-Type: text/html

Listing 2: An example HTTP response

followed by the actual content (HTML in this case but can be anything including binary octet streams etc). This can easily be verified by using a telnet session e.g. typing

telnet www.voxpilot.com 80

followed by HTTP request (listing 1) and two blank lines should trigger a response similar to listing 2.

The header fields Cache-Control, Expires, Last Modified, and ETag in the HTTP response example above control the caching policy for the requested object (index.html). We explain the meaning of these and similar fields next.

Page 2

back to the top

Copyright © 2001-2002 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).