|
An Introduction to SCXML
By Jim Barnett, Aspect Communications
Abstract: SCXML is a flexible state machine language that combines concepts from CCXML and Harel State Tables. It enhances the basic concept of state machines with such powerful concepts as conditions on transitions and nested and parallel states. As a result, it provides a compact and intelligible representation of complex systems. SCXML is being developed in conjunction with VoiceXML 3, but it will be useful in a wide variety of applications involving the control and synchronization of asynchronous resources.
Introduction
SCXML (State Chart Extensible Markup Language) is a flexible state machine language that is specifically designed for the construction of voice and multimodal interfaces. SCXML is being developed by the W3C's Voice Brower Working Group as part of a larger effort to make VoiceXML more modular. The motivation behind this effort is the observation that VoiceXML 2.0 applications are often difficult to maintain or reuse because they don't separate flow of control from presentation logic. Thus a single form will often contain both prompts and grammars to interact with the caller and <submit> or <goto> tags to move to the next form. Anyone who wants to reuse the presentation logic (the interaction with the caller) has to weed out all the <goto>s (the flow logic). Similarly, anyone wishing to modify the control flow logic has to wade through the presentation logic to find it. In response, the Voice Browser Working Group is focusing on an architecture for VoiceXML 3.0 that cleanly separates data, presentation logic, and flow control. This work is far from complete, but it may be useful to think of it as a refactoring of the existing <form> element so that the presentation logic (the user interaction) is kept separate from the flow control (<goto> or <submit>) and the global application data. This factoring results in a programming model that similar to the model/view/controller paradigm that has proved valuable in web applications.
Although this new architecture may seem like a radical change, it is important to note that it is already possible to program in this model by using CCXML in conjunction with VoiceXML. Although CCXML is often described as a call control language, its call control functionality is embedded in a general-purpose state machine language that can be used for a wide variety of purposes specifically including the representation of the flow control logic of complex VoiceXML programs. In developing SCXML our goal has been to refactor CCXML to separate out the call control primitives from the state machine framework and to augment the latter with powerful concepts from Harel State Tables, a state formalism that is the foundation of the state machine notation in UML . As part of UML, Harel State Tables have been widely used for years to model reactive systems, namely those that must respond to asynchronous inputs. SInce human-computer interfaces, including both voice interfaces and multimodal interfaces, are textbook examples of reactive systems, we think that the Harel concepts will prove a valuable extension to CCXML.
The first working draft of the SCXML specification can be found here The next section of this paper offers a quick introduction to its main constructs. The example presented, a voice interface to an email reader, is chosen solely to illustrate the state machine notation, and you should not assume that the ASR and TTS functionality that it contains bears any resemblance to VoiceXML 3.0 markup.
2. Overview of SCXML
As in all state machine notations, the basic concepts in SCXML are states and transitions. Example 1 shows a simple representation of a speech recognition system that has three states: Listening, Recognizing, and AnalyzeResult. The system moves from Listening to Recognizing when it gets the SpeechDetected event and thence to AnalyzeResult on the RecoDone event. For the sake of illustration, we include the UML diagram as well as the SCXML markup that corresponds to it. Note that SCXML is fully asynchronous so events can occur at any time and the system will simply ignore them if it has no transitions defined for them in its current state. In the example here, suppose that the underlying platform generated a SpeechDetected event while the system was in the Recognizing state. The SCXML interpreter would not consider this to be an error, but the event would effectively be dropped since the Recognizing state responds only to the RecoDone event. (In a practical implementation we might want to modify the Recognizing state to raise an alarm in this case, since two consecutive SpeechDetected events would indicate some sort of problem.)
Example 1
<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="Listening">
<transition event="SpeechDetected">
<target next="Recognizing">
</transition>
</state>
<state id="Recognizing">
<transition event="RecoDone">
<target next="AnalyzeResult">
</transition>
</state>
<state id="AnalyzeResult"/>
</scxml>l;
SCXML also allows conditions to be placed on transitions. In Example 2, we see a simple email reader, which responds differently to the ReadDone event based on the value of the variable email.next. If email.next is not null, the reader loops back to the ReadEmail state to read another email, while it goes to the Done state if the variable is null. Note that Done has a 'final' attribute set to true, which means that it is a final state. The significance of final states will become clear in Example 4. (In the UML notation, final states are indicated by small circles with a filled center, as in this example.)
Example 2
<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="ReadEmail">
<transition event="ReadDone" cond="email.next!=0">
<target next="ReadEmail">
</transition>
<transition event="ReadDone" cond="email.next==0">
<target next="Done">
</transition>
</state>
<state id="Done" final="true"/>
</scxml>l;
In Example 3 we temporarily omit the Done state and elaborate the reading logic by adding <onentry> and <onexit> operations to the ReadEmail state. These are actions that are executed whenever the state is entered or left. In this example, we start reading a new email whenver we enter the state and update our counter variables whenever we leave it. The details of reading the email would be application-specific, but in this case we use the <send> tag, similar to the one in CCXML, to send an event/command/message to a TTS system. The <send> tag gives SCXML a flexible way of communicating with external resources and is the primary means of integrating SCXML into a larger system. As in CCXML, we assume that the implementation also provides a means for external entities to deliver events to the SCXML session. The ReadDone event would be such an external event, generated by the external TTS system when the play was complete.
In the <onexit> handlers, we use the <assign> tag, again similar to CCXML, to update the value of the variables email.current and email.next. Thoughout these examples, we wave our hands at the question of how these variables, which are internal to the SCXML session, are kept in synch with the stae of the underlying email system. The details of the integration with the underlying email system would be highly implementation-dependent so we omit them throughout these examples for the sake of simplicity.
Example 3
<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="ReadEmail">
<onentry>
<send target="TTSSystem" type="TTS" event="queue" namelist="email.current"/>
</onentry>
<onexit>
<assign name="email.current" expr="email.next"/>
<assign name="email.next" expr="email.next + 1"/>
</onexit>
<transition event="ReadDone" cond="email.next!=0">
<target next="ReadEmail">
</transition>
</state>
</scxml>l;
One of the most powerful features of Harel State Tables that SCXML borrows is the notion of nested states. Nesting facilitates for the modelling of complex tasks by allowing a parent state to be decomposed into substates. In Example 4, we have embedded the ReadEmail and Done states in a surrounding ProcessEmail state, which also contains a Preproces state and an Initial pseudo-state. In the UML notation, the child states are drawn inside the parent state. In SCXML, the <state> tags for the children are immediate children of the parent tag. The nesting is fully recursive in both cases, so that child states may have their own children nested inside of them, though we do not show examples of this in this article.
The semantics of nested states requires that whenever the system is in the ProcessEmail state, it is in one, and only one, of its substates (i.e., Preprocess, ReadEmail or Done). Initial is a pseudo-state because it is not really a state and the system is never 'in' it. Instead, Initial indicates the substate that the system should transition to if a transition specifies the parent state ProcessEmail as its target. (Transitions may also go directly to substates.)
Much of the power of nested states comes from their interaction with transitions and <onentry> and <onexit> handlers. Suppose the system transitions to the ProcessEmail state as shown in example 4. Given the value of the <initial> tag, the system will also simultaneously move to the Preprocess state. The complex transition is atomic, in the sense that there is no time during which the system is in ProcessEmail but not Preprocess. However, the <onentry> handlers for ProcessEmail will be executed before those for Preprocess - 'from the outside in', so to speak. Now suppose that the platform generates the Ready event. The system will transition to ReadEmail and execute its <onentry> handlers. If Preprocess had any <onexit> handlers, they would execute before ReadEmail's <onentry> handlers. However, no handlers defined at the ProcessEmail level fire during this transition because the system has not left the parent ProcessEmail state. Now suppose that while the system is in the ReadEmail state, the platform generates the AbortRead event. ReadEmail does not have a transition defined for this event, but the parent ProcessEmail does. This parent transition is triggered, sending the system to the WaitForCommand state. This transition causes the system to exit both ReadEmail and ProcessEmail, and their <onexit> handlers are invoked in that order ('from the inside out'). Thus the execution order of the <onentry> and <onexit> handlers matches the nesting structure of the states and offers us a guarantee that certain operations will be carried out no matter transition or transitions the system takes to enter and leave the states in question.
The selection of transitions also follows the nesting structure of states in that the mostly tightly nested transition wins. In other words, if ReadEmail had a transition defined for the AbortRead event, it would have been selected instead of the one at the ProcessEmail level. The logic behind this choice becomes clear when we realize that the child state represents a refinement of the parent state and therefore 'knows more' about the situation. It would also be possible for multiple transitions within a single state to match an event. For example, ProcessEmail might define two transitions for AbortRead with different <cond> clauses, both of which might evaluate to true in some circumstances. In this case (a 'tie' between transitions defined at the same level), SCXML will select the first transition in document order.
Finally, Example 4 shows the significance of final states. Since Done is a final state and an immediate child of ProcessEmail, we know that ProcessEmail has finished when the system reaches Done. In SCXML, this causes a ProcessEmail.done event to be raised, which can be used to trigger transitions like any othe event. In this case, ProcessEmail transitions to WaitForCommand on the ProcessEmail.done event. (In the UML diagram, the ProcessEmail.done event is implicit and the same transition is shown as a line from ProcessEmail to WaitForCommand without any indication of the event.) Thus the system will move from ProcessEmail to WaitForCommand under two conditions, the first being the occurrence of the AbortRead event when the system is anywhere in ProcessEmail, and the second being the system's arrival at the Done state via normal processing inside ProcessEmail.
Example 4
<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="ProcessEmail">
<onentry>
<var name="email.current">
<var name="email.next">
<var name="mail" expr="initMailStruct()">
<send target="emailSystem" type="email" event="fetch" namelist="mail"/>
</onentry>
<onexit>
<send target="emailSystem" type="email" event="CloseMailBox" namelist="mail"/>
</onexit>
<initial>
<transition>
<target next="Preprocess">
<transition>
</initial>
<transition event="AbortRead">
<target next="WaitForCommand">
</transition>
<transition event="ProcessEmail.done">
<target next="WaitForCommand">
</transition>
<state id="Preprocess">
<onentry>
<assign name="email.current" expr="first(mail)">
<assign name="email.next" expr="second(mail)">
</onentry>
<transition event="Ready">
<target next="ReadEmail">
</transition>
</state> <!-- Preprocess -->
<state id="ReadEmail">
<onentry>
<send target="emailSystem" type="email" event="queue" namelist="email.current"/%gt;
</onentry>
<onexit>
<assign name="email.current" expr="email.next"/>
<assign name="email.next" expr="email.next + 1"/>
</onexit>
<transition event="ReadDone" cond="email.next!=0">
<target next="ReadEmail">
</transition>
<transition event="ReadDone" cond="email.next==0">
<target next="Done">
</transition>
</state> <!-- ReadEmail -->
<state id="Done" final="true"/>
</state> <!-- ProcessEmail -->
<state id="WaitForCommand"/>
</scxml>l;
Example 5 shows the use of parallel states in SCXML. Here we have added a set of VCR control states to our email reader. These states run in parallel to the email reader, meaning that at any given time the system is simultaneously in ProcessEmail (and one of its substates) and in one of the VCR control states (VCRControl, Volume or Speed). Parallel states thus represent a kind of fork and join logic, allowing control to be split into concurrent threads. In this case, the parallelism is useful because the VCR states behave the same way no matter where we are in the email reader.
In this example the UML diagrams deviate somewhat from the SCXML markup because the structure is more explicit in the latter. In the SCXML there is a single top-level state Main, with a <parallel> child. The semantics of the <parallel> tag require that entering Main entail simultaneously entering each of the <parallel> tag's children. In this case, we have created a child state ControlState containing the VCR Control, Volume, and Speed states. (In the UML diagram, the Main and ControlState states are implicit.) The logic of ControlState and its children are straightforward. The system waits in VCR Control for either the IncreaseVolume or IncreaseSpeed command, in which case it transitions to either the Speed or Volume states, uses the <send>command to trigger the appropriate platform action, and then transitions back to VCR Control when the platform generates the Platform.done event. To flesh out the example, we would want to allow for the speed and volume to be decreased as well. We could do this either by adding DecreaseSpeed and DecreaseVolume events, or by having ChangeSpeed and ChangeVolume events with a parameter for the amount of the change (positive for increase, negative for decrease.) The <onentry> actions could then pass the value of the parameter to the platform rather than using hardcoded defaults.
Finally, note that in Example 5 the ProcessEmail state is not included in-line but is instead loaded from a separte file, ProcessEmail.scxml. This inclusion mechanism allows for the reuse of markup and can also be used to break up complex complex state machines into more manageable chunks. (The UML diagram contains a graphical equivalent to SCXML's inclusion by reference in that only the top-level ProcessEmail state is shown, even though all its substates are implicitly present).
Example 5
<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="Main">
<parallel id="Par">
<state id="ProcessEmail" src="ProcessEmail.scxml"/>
<state id="ControlState">
<initial>
<transition>
<target next="VCRControl">
</transition>
</initial>
<state id="VCRControl">
<transition event="IncreaseVolume">
<target next="Volume"/>
</transition>
<transition event="IncreaseSpeed">
<target next="Speed"/>
</transition>
</state>
<state id="Volume">
<onentry>
<send target="platform" event="IncreaseVolume" namelist="incr=5"/>
</onentry>
<transition event="Platform.Done">
<target next="VCRControl"/>
</transition>
</state>
<state id="Speed">
<onentry>
<send target="platform" event="IncreaseSpeed" namelist="incr=10"/>
</onentry>
<transition event="Platform.Done">
<target next="VCRControl"/>
</transition>
</state>
<state/> <!-- ControlState -->
</parallel>
</state> <!-- Main -- >
</scxml>l;
Example 6 completes the picture by adding the ASR states in parallel with ControlState and ProcessEmail, both of which are included from external files this time. We have wrapped a parent ASRState around Listening, Recognizing and AnalyzeResult, and added transitions from the latter state back to Listening. These transitions are conditioned upon the value of the Result variable and use the <send> tag with target of 'scxml' to raise events that are internal to the state machine and thus may trigger transitions in other parallel states. (Note again that this is similar to the <send> tag in CCXML.)
The full example shows the power of the interaction between parallel and nested states. There are three separate threads on control: one reading email, another listening for user input and a third handling VCR control events. In the current example, the handling of control events could be directly incorporated into the speech states, but in a multimodal system, the control events could be generated by GUI input as well as speech so it makes sense to keep them separate. The three parallel sets of states are independent of each other in that the speech recognizer doesn't care what state the email reader is in and vice-versa, but they communicate by raising events. If the speech recognition system detects a command to raise the volume, it generates the appropriate event, which is caught by the VCR Control state, which then issues the appropriate platform command, while the email reader continues uninterrupted. On the other hand, if the speech recognition states detect an abort command, they generate an AbortRead event, which is caught by the ProcessEmail state no matter what substate it is in. Since the VCR Control states don't care about the AbortRead command, they ignore it. We have thus succeeded in factoring a complex user interface into three compact state machines which interact in a flexible but strictly defined manner. The result is a simple representation of a complex system.
Example 6
<?xml version="1.0" encoding="us-ascii"?>
<scxml version="1.0" xmlns="http://www.w3.org/2005/SCXML">
<state id="Main">
<parallel id="Par">
<state id="ProcessEmail" src="ProcessEmail.scxml"/>
<state id=ControlState" src="ControlState.scxml"/>
<state id="ASRState">
<initial>
<transition>
<target next="Listening">
</transition>
</initial>
<state id="Listening">
<transition event="SpeechDetected">
<target next="Recognizing">
</transition>
</state>
<state id="Recognizing">
<transition event="RecoDone">
<target next="AnalyzeResult">
</transition>
</state>
<state id="AnalyzeResult">
<onentry>
<var name="Result" expr="ProcessASRResult()"/>
</onentry>
<transition cond="Result=Louder">
<target next="Listening">
<send target="scxml" event="IncreaseVolume"/>
</transition>
<transition cond="Result=Stop">
<target next="Listening">
<send target="scxml" event="AbortRead"/>
</transition>
<state/> <!-- ASRState -->
</parallel>
</state> <!-- Main -- >
</scxml>l;
To conclude the example, it may be useful to compare the SCXML state model with CCXML's. The construct in CCXML that corresponds most closely to SCXML's <state> is <eventprocessor> since it holds the transitions and executable content to handle events. CCXML's transitions, however, do not move to a different <eventprocessor>, so they correspond to a special-case transition in SCXML, namely a self-transition, which is one with an empty <target>. Such transitions cause the system to remain in the same state, without executing <onentry> or <onexit> elements, and thus amount to event handlers. CCXML's <goto>, which switches to a separate document, does cause the system to switch to a new <eventprocessor> and is thus most similar to a SCXML <transition> in the general case. Finally, it is worth pointing out that CCXML's 'statevar' construct, which is used to condition transitions, is really just another piece of data in the SCXML model, and one that has no particular connection to the <state> construct. Despite these significant differences in syntax, CCXML markup can be converted to SCXML automatically, and a XSLT script for this purpose is included in the SCXML specification.
Conclusion
The full email reader example shows how SCXML can provide a compact and perspicuous representation of a complex interactive system. It is worth highlighting how naturally nested states capture task decomposition, while parallel states easily handle interactions that cross modalities. The <onentry> and <onexit> tags make it easy to ensure that setup and cleanup happen properly, while the transition selection logic enables us to place default transitions in parent states that can be overriden by their children. As a result, SCXML can be used for a variety of purposes. It can be used:
- as a representation of the application-level flow control in VoiceXML (i.e. using the state machine logic to replace the <goto> between forms).
- as a cross-modality synchronization mechanism in a multimodal interface (the email reader example could be extended to cover this if we added a set of GUI input states in parallel to the ASR states).
- as a dialog control mechanism in a language with low-level SALT-style primitives (the ASR states in the email reader are an example of this)
- as a call control language (i.e., as part of CCXML narrowly defined).
- as a higher-level process control language (the ProcessEmail states are an example of this).
The interesting thing about this list is that SCXML can be used for both high-level and low-level tasks and can provide either tight or loose synchronization. It is for this reason that we have kept the definition of SCXML fairly general, without reference to specific tasks. Much of the power of state languages lies in the fact that they are a clean mathematical abstraction that is capable of representing a wide variety of concrete systems. We therefore hope that SCXML will prove useful in other areas beyond its specific application to CCXML and VoiceXML 3.0.
|