2004 VoiceXML Forum Membership Survey
By Jim Ferrans
VoiceXML Forum Technical Council Chair
Introduction
The talented and dedicated people serving on the VoiceXML Forum's technical committees are doing many things, such as setting up open source tool development projects, organizing an independent conformance program, creating a developer certification program, and publishing the VoiceXML Review. More technical work is starting all the time: in addition to the current Conformance, Education, and Tools Committees, we've just chartered a new Accessibility Committee, and are contemplating the formation of one or two new standards-oriented committees later this year.
The Forum's Technical Council is chartered with coordinating this technical work for the Forum. It's our job to listen to our membership and ensure that their needs are addressed. An important way we do this is through regular surveys, the first of which was completed this May. This article reports on what we learned.
The survey covered a wide variety of topics. We asked our member companies:
- How they used VoiceXML,
- What VoiceXML features they liked and disliked,
- What they were doing with SALT,
- How important VoiceXML conformance was to them,
- What features of voice platforms they thought were most crucial,
- What features they wanted to see in VoiceXML "V3", and
- What they thought about the emerging area of multimodality.
The survey took roughly 30 minutes to an hour to fill out online. Thirty-one companies took the survey, nearly ten percent of our membership. We promised we would not release company-specific information outside of the Technical Council, and permitted anonymous surveys. We did however encourage companies to provide their names and a contact person, to better help us in our analysis of the results, and to help prevent "gaming" of the survey. Only four companies chose anonymity. Of the 27 others, there were only two Sponsors and three Promoters: a full 22 were at the Supporter level. There was a representative balance of large and small companies, and companies from a wide variety of industries. Many of the companies are not typically associated with VoiceXML. All of this was encouraging, as we wanted a representative cross-section of our members, not a sample drawn only from only the most active companies.
VoiceXML Usage
The first questions asked about how the companies were using VoiceXML. Figure 1 summarizes the responses:
Figure 1: How does your company use VoiceXML?
Every responding company used VoiceXML in at least one way (except one which planned on using it in the next year). The average company used it in three ways, and one company even managed to use it in seven.
VoiceXML application development. The most common use was application development: 25 out of the 31 companies developed VoiceXML applications: 22 for others, and 16 for themselves.
VoiceXML platforms. A full 18 of the 31 companies provided VoiceXML platforms (hardware and/or software) to other companies. This seemed fairly high to us.
VoiceXML training. Eleven of the companies are involved in training. Only one focused exclusively on training. Seventy-three percent of companies doing training also deployed platforms, 27 percent did hosting, 82 percent did application development, and 45 percent developed tools. Training is therefore primarily an adjunct to the main business of our respondents.
VoiceXML tools. Ten of the companies sell VoiceXML tools. It would be interesting to probe deeper into this area and see what kinds of tools are being deployed, whether they are used internally as well as externally, how interested these companies would be in open source tools efforts and so on. There was no company focused entirely on tools: every tool vendor also had a platform product or a hosting service
VoiceXML hosting. Seven companies did VoiceXML application hosting for other companies. Only one of usual VoiceXML hosting powerhouses was in this list.
As an exercise, we compared our lists of companies against Ken Rehor's excellent lists of VoiceXML companies. Astonishingly, there was almost no overlap. Of the fifteen platform providers for which we had names, only two were in Ken's list of platform providers. Of the seven identified tool vendors, just one appeared in Ken's list of tool vendors. And of the six identified hosting providers, again only one appeared in Ken's hosting list. We concluded that:
- VoiceXML is proliferating much faster than we had been aware of.
- Ken needs a full-time research assistant.
The overall impression we got from these responses is that the VoiceXML market is vibrant, but still in its early days. We have not yet seen a consolidation among the scores of platform vendors, and companies are not yet specializing in tools or training. Specialization and consolidation will happen as conformance and interoperability continue increasing.
VoiceXML Applications
We next asked companies what kinds of VoiceXML applications they've deployed, and how many. We only wanted to count commercial-grade applications providing real value to customers right now. These questions were optional because quite a few companies were under NDA or otherwise wanted to maintain confidentiality in this area.
Types: Only ten of the 25 companies that have developed applications chose to tell us about the commercial applications they've deployed using VoiceXML, These ten, about three percent of the Forum's membership, reported a very substantial list:
- A customizable financial application suite for the securities industry;
- A major brokerage firm's voice application suite;
- A railroad tracking system;
- Several web portals;
- A phone game suite;
- A voice mail application;
- A business portal automating support for a carrier's DSL service.
- An automatic wakeup service deployed by a major European carrier.
- Various commercial applications for the television industry, and a PDA manufacturer;
- Health care applications;
- Telecom applications; and
- A phone banking system for a huge retail bank with 11 million customers, 2 million of whom use phone banking.
- A carrier's voice portal with two dozen services such as news, weather, soccer updates, traffic, movie information, and a television guide.
Number: Our survey was not of a size or scope to find out how many voice applications are deployed in total, and what proportion are authored in VoiceXML versus other authoring approaches. We would need a larger sample drawn from the entire industry to do that.
Publicly available application counts don't get close to a definitive answer. They are too colored by individual companies trying to look their best, and by proponents of one authoring approach exaggerating application counts to diminish other authoring approaches. Even the concept of application is fluid: Should only those in service count? Is a voice portal one application or fifty? Should an application serving two million calls a day count the same as one that serves hundreds?
The most reliable sense of VoiceXML's impact can be found in independent market research done by consultancies like InStat/MDR, Frost and Sullivan, Zelos Group, Datamonitor, IDC, and Yankee Group. They are reporting very encouraging findings this year. For instance, the Zelos Group's Dan Miller recently said that "VoiceXML is the standard scripting language for rendering Web pages over the telephone. VoiceXML 2.0-compliant products are already on the market from core technology, platform, development tool and hosted services providers, and there is broad industry adoption. More importantly, purchase decision-makers among the major speech-enabled enterprises, including financial services, travel, telcos, see VoiceXML compliance as a requirement. It gives them bargaining leverage across vendors and solutions providers and carries with it the promise of re-usable code and portability. Art Schoeller of the Yankee Group found this year that There is a huge momentum behind VoiceXML right now. Based on corporate requests for proposals (RFPs) and actual deployments, that is easy to see."
This momentum is amply seen in our survey. We asked each company to tell us how many deployed applications were in service currently and how many they expected to see in service a year from now. Twenty-four companies responded. We discarded one larger company who seemed to be gaming the system. The remaining companies were mainly small, and their answers correlated quite well with information publicly available on them. They reported a total of 208 deployed VoiceXML applications today, and expected to have 862 VoiceXML applications in service a year from now. This better than quadrupling of the VoiceXML market size agrees with what market researchers are finding from wider samples.
SALT Usage
We next asked our respondents about their use of SALT for authoring voice and multimodal applications.
The VoiceXML Forum's 333 companies are a major part of the voice industry. Because of this, 47 of the SALT Forum's 79 members (60 percent) are in the VoiceXML Forum, while 14 percent of our members are in the SALT Forum. These dual membership companies tend to be larger, more active, and more serious participants in the industry. Given this high level of cross-membership, the answers to these questions should shed light on how much impact SALT will have.
The voice industry is quite pragmatic. Companies are interested in meeting customer needs by deploying commercially valuable applications and services. They see standards as a key means to this end, but generally don't want to waste energy by getting polarized about them. And polarization seems not to be happening. Three factors point to this.
First, SALT Forum companies are very active in the VoiceXML Forum. After the VoiceXML Forum was restructured in late 2003 to allow any member to participate at the board level, the original four founding board members were joined by seven more. Significantly, five of our new board members are from the SALT Forum: HP, Verizon, Vocalocity, VoiceGenie, and West. After our August 2004 board elections, our board's chairperson and vice-chairperson are from SALT Forum companies.
Second, SALT Forum companies are highly committed to VoiceXML. I keep a list of their recent announcements on VoiceXML, and the proportion making serious commercial investments in VoiceXML is surprisingly high, nearly triple the proportion investing in SALT. I found voice-related product and service announcements for 58 of the 79 companies. Of these 58, 46 (79 percent) made very significant commercial bets on VoiceXML. Another 8 (14 percent) made lesser commitments to VoiceXML.
Finally, the results of our survey indicate that SALT-oriented companies are deploying an order of magnitude more VoiceXML applications than SALT applications.
We first asked our sample a series of questions about how companies use SALT, mirroring the questions for VoiceXML. The results are summarized in Figure 2.
Figure 2: How does your company use SALT?
Not surprisingly, this reflects the same proportions we see for VoiceXML.
Types: We asked about types and quantities of deployed SALT applications, as we did for VoiceXML. We could not ascertain what types of applications SALT will be used for, since none of our respondents had yet deployed a SALT application. But we expect SALT to be used in nearly the same way as VoiceXML.
Number: Five respondents answered that they were working on SALT applications. The total number of our sample's deployed SALT applications in May 2004 was 0. The total number of deployed SALT applications they expect to have by May 2005 is 42. This is contrasted with VoiceXML deployments in Figure 3.
Figure 3: VoiceXML and SALT deployments (all respondents).
Relative use of VoiceXML and SALT. From our data can we say that 100 percent of the markup based voice applications are VoiceXML this year, or that next year only 4.6 percent will be SALT? No: our sample was self-selecting and drawn from only VoiceXML Forum companies. But there is an interesting thought experiment we can do.
Our respondents included an above average proportion of SALT Forum members: six of 27 identified companies (22.2 percent), relative to the 14.1 percent ratio for the full VoiceXML Forum. What if we looked at just these six plus the four other companies reporting that they were deploying SALT applications? The proportion of SALT applications these SALT-oriented companies are deploying surely should represent an upper bound for the industry as a whole. Figure 4 shows the results for the ten companies that fit the SALT-oriented profile.
Figure 4: VoiceXML and SALT deployments (SALT-oriented respondents).
The data shows that, remarkably, even the companies most interested in SALT will deploy fully 91.1 percent of their applications in VoiceXML next year, and only 8.9 percent in SALT. Clearly there is no great fragmentation happening in the voice industry, and the signs are that VoiceXML will continue to dominate it.
Strengths and Weaknesses of VoiceXML 2.0
Our next series of questions tried to tease out what features our membership would like to see in VoiceXML "V3". These were used to prepare a short position paper we are forwarding to the W3C.
Strengths of VoiceXML 2.0. When asked an open-ended question about VoiceXML 2.0's strengths, our respondents had these comments. (We present only those comments mentioned by two or more companies.)
- VoiceXML 2.0 is an open widely accepted W3C standard; standardization means low costs, strong core technology, platform independence, no vendor lock-in, broad developer community. [17 companies]
- It is simple, easy to use, natural, easy to develop complex applications. [15]
- It uses the web paradigm: internet infrastructure, separation of logic and presentation, ease of deployment, [10]
- High portability. [5]
- It results in rapid implementation, can be used for rapid prototyping. [4]
- Powerful. [4]
- Allows switching between ASR and TTS systems. [3]
- Short and concise dialog flow, FIA. [2]
- Flexible. [2]
- Supports ECMAScript. [2]
Weakness of VoiceXML 2.0. When asked an open-ended question about VoiceXML 2.0's weaknesses, our respondents had various comments. (We leave in comments from only one company, as they may prompt change requests for VoiceXML "V3").
Mentions of features for "V3" we later explicitly asked about (discussed in the next section):
- Want more control over ASR settings. [3 companies]
- FIA too complex, non-intuitive in some cases, too restrictive, sometimes want to define my own [3]
- No support for event-driven programming (e.g., asynchronous interrupts). [2]
- Want to see more call control features. [2]
- Want better CCXML integration [1].
- Need to be modularized for reusability. [1]
- Not extensible (e.g., for video output, multimodal). [1]
Comments on VoiceXML 2.0 per se:
- Sub dialogs should be more flexible, e.g., allow running without new execution context [2].
- Lack of support for multimodal interaction. [1]
- The W3C's VoiceXML 2.0 specification is not clear enough: many details are not filled in. [1]
- Too much programming. [1]
- Want to see error.badfetch subtypes to aid in problem determination. [1]
- Want to have expr as well as value in . [1]
- No dynamic vocabulary (want voice and text enrollment). [1]
- Want more flexibility in accessing recognition results. [1]
- Want better prompt control. [1]
Comments already addressed in VoiceXML 2.1:
- Want a tag. [1]
- Want to record during recognition for logging and tuning. [1]
- Want a expr attribute for dynamic grammar generation. [1]
Comments regarding SSML, SRGS, SISR specifications:
- The Semantic Interpretation specification (SISR) is too complex. [1]
- Using SRGS for DTMF grammars leads to somewhat lengthy documents. [1]
Comments about the speech and VoiceXML industry:
- Lack of VoiceXML portability due to vendor limitations, vendor-specific extensions. [4]
- Tools are immature, we need an IDE for VoiceXML. [3]
- Grammar standards (SRGS/SISR) are not yet well adopted by vendors. [1]
- The server-side VoiceXML generation tools from [...] and [...] generate too many round trips and result in inefficiencies. [1]
- Want to be able to do open transcription. [1]
Weaknesses were mentioned only half as much as strengths, and did not cluster around any one area in particular. Those areas that got multiple mentions are ones already identified as areas to consider for "V3", or are comments on the industry, not the standard.
Features Desired in VoiceXML "V3"
Features desired in VoiceXML "V3". We also asked a guided series of questions on possible specific VoiceXML "V3" features. The results are shown in Figure 5:
Figure 5: What features would you most like to see in VoiceXML "V3".
Crucial features. Our respondents backed five potential features/capabilities for "V3" very strongly:
- A high level of compatibility is important. 24 of 31 companies are highly interested in compatibility, either by having rigorously equivalent syntax and semantics, full backwards compatibility, or automated translation between 2.0 and "V3". Two other companies wanted "look and feel" compatibility, while four did not consider compatibility important (Figure 6). This was an overwhelmingly unified response. (See Figure 6).
- The ability to communicate between a VoiceXML session and external entities is important. 21 companies would like to permit VoiceXML "V3" sessions to communicate with external entities outside of the HTTP request/response model.
- Support for call control within VoiceXML remains important. CCXML is viewed as an important standard, however 20 respondents indicated that some level of call control capability within a VoiceXML session continues to be important.
- Additional control over low-level media is desirable. 17 respondents want to see more control over low-level media resources in "V3".
- Modularization. This is viewed as a key "V3" requirement by 16 respondents.
Important features. While ailing to address the any of preceding five items would lead to acceptance issues for "V3", we also identified two other features that should be seriously considered.
- Speaker verification. This is viewed as a key "V3" requirement by 9 of 31 respondents.
-
Additional control over the FIA is desirable. This is viewed as a key "V3" requirement by 8 respondents.
Figure 6: How much backwards compatibility with VoiceXML 2.0 should VoiceXML "V3" have?
The features identified as crucial are mainly those already identified by the W3C. One key takeaway is that "V3" should be as compatible as possible with VoiceXML 2.0 if it is to be relevant to the industry.
Selecting a VoiceXML Platform
Next we asked what their top three factors were in selecting a VoiceXML platform. The responses are shown in Figure 9.
Figure 7: What factors are most important when selecting a VoiceXML 2.0 platform.
The answers seem reasonable. To do its job, a voice platform must be reliable, use an effective speech recognizer, and be affordable. Once these basic needs are met, it has to adhere to standards. Below these needs come lesser ones.
We next divided the thirty companies responding to this question into platform vendors (n=17) versus non-platform vendors (n=13). On most factors the two groups were in close agreement, but two factors showed interesting discrepancies:
- Platform vendors tended to overrate capacity's importance relative to non-platform vendors.
- Platform vendors vastly underrated the importance of debugging support relative to non-platform vendors.
This suggests that platform vendors should revisit their application debugging capabilities to ensure that they are satisfactory.
Conformance
When asked if they authored applications for multiple voice platforms, 16 respondents said yes, 11 no. The other seven did not answer, many because they don't author applications. For those who developed applications for multiple platforms (n=16), we asked how many of their applications needed to have separate versions maintained for each platform. Five maintained all applications for separate platforms, ten did not need to maintain separate applications, and one was in between. We were not able to ascertain whether or not conformance was the issue vs. other factors such as dependence on vendor extensions, ASR tuning properties, etc. This indicates that platform conformance, at least in the past, has been a serious issue for some respondents.
When they were asked explicitly if interoperability was a key issue, 11 companies said interoperability is "very important", seven said it was "important", and six said it was "somewhat important". Four felt it was not important, and three didn't answer.
We then asked about specific conformance areas, and got the results shown in Figure 8.
Figure 8: Which areas of conformance impact you and how severely?
The main factor impacting conformance was platform-dependent features. Application developers either explicitly take advantage of them, or are forced into using them (e.g., ASR tuning properties). The W3C is aware of these issues, and has standardized some of the more common areas of difference in VoiceXML 2.1 (e.g., the expr attribute on |