Volume 1, Issue 9 - October 2001
   
 

The Interface between Next-Generation
Application Servers and Media Servers: SIP and VoiceXML

By Eric Burger

The innovation and reach of the Web combined with the power of real-time voice opens up all kinds of possibilities for enhanced services and applications. In the IP environment of the next generation network (NGN), it is easier to combine web content with real-time, interactive communications. This is bringing about new types of converged services that go far beyond the PSTN replacement services of voice mail, messaging and IVR to wireless web media, network gaming and web conferencing.

While softswitches and media gateways form the foundation of the access and transport infrastructure of a next generation network (NGN), applications servers and media servers are the core components of the emerging application and enhanced services infrastructure. Application server and media server components have evolved to power these enhanced services. However, the interface from the application server to the media server has not yet been fully defined.

This article proposes Session Initiation Protocol (SIP) and VoiceXML as the best interface between next-generation application servers and media servers. It examines how these open technologies enable flexibility and since SIP and XML are already entrenched in the web development community, how they will propel the development of more applications and in turn reduce vendor lock in. Readers should gain an understanding of the interface requirements, challenges and details of the proposed interfaces and their advantages

From its inception the development of SIP was a very open, collaborative effort. Having its origins in the Internet Engineering Task Force (IETF) and university development, SIP began as a standard for packet-based multimedia conferencing. Other competing interface standards such as MGCP and H.248 had different origins. MGCP, which was originally developed to bridge communications between the PSTN and IP networks, originated in a variety of flavors that varied depending on the vendor putting it forward. H.248, which was also developed as an interface between standard telephony and IP, was developed through the ITU process.

SIP is an application-layer control (signaling) protocol for creating, modifying and terminating sessions with one or more participants over a network. These sessions go beyond simple conferencing to include content services such as Internet telephone calls and multimedia distribution.

VoiceXML is the standard with which voice response applications are developed on the Internet. Although the inventors of VoiceXML were thinking in terms of speech recognition, there is nothing about VoiceXML that prevents it from being used for other applications such as interactive voice response. In fact, when coupled with SIP, VoiceXML has been shown to be very applicable to other modes of input such as touch-tone access.

SIP and VoiceXML combined can be used together for initiating and terminating sessions of all types, not just signaling and control sessions but also content sessions. These sessions could convey simple presence information such as, 'I'm in my car now', meaning that my presence is in the car so call me on my car phone or 'I'm at my desk', meaning send the documents or other media to me there. The ability to establish these sessions means that a host of innovative services become possible and economical such as, voice-enriched e-commerce, web page click-to-dial, instant voice chat with buddy lists, and IP Centrex services.

SIP is a request-response protocol that closely resembles HTTP. HTTP is the basis of the World Wide Web. Using SIP, telephony becomes another web application and integrates easily into other Internet services.

VoiceXML is based on open web-based programming languages such as XML and HTML. There is a much larger set of support tools for HTTP, XML and HTML, such as XML editors, syntax checkers, debuggers, etc. that don't exist for the other more proprietary telecom protocols and languages. There are many development packages for creating XML-based applications. These packages are equally accessible to anyone from the largest corporations to programmers at home. The opposite is true from MGCP and H.248.

SIP & VoiceXML require much less time and resources to learn than the more proprietary PSTN-oriented protocols such as MGCP and H.248. SIP and VoiceXML are familiar to a wider base of programmers. The service development time is also much shorter using these open Internet standards.

SIP and VoiceXML use a model that is very familiar to the general IP workforce whereas the MGCP and H.248 model is familiar to a much smaller, more specialized, group of telephony programmers. This is a very small group when contrasted with the number of web masters and Java programmers worldwide.

Continued...

back to the top

 

Copyright © 2001 VoiceXML Forum. All rights reserved.
The VoiceXML Forum is a program of the
IEEE Industry Standards and Technology Organization (IEEE-ISTO).