Brought to you by EarthWeb
IT Library Logo

Click Here!
Click Here!

Search the site:
 
EXPERT SEARCH -----
Programming Languages
Databases
Security
Web Services
Network Services
Middleware
Components
Operating Systems
User Interfaces
Groupware & Collaboration
Content Management
Productivity Applications
Hardware
Fun & Games

EarthWeb Direct EarthWeb Direct Fatbrain Auctions Support Source Answers

EarthWeb sites
Crossnodes
Datamation
Developer.com
DICE
EarthWeb.com
EarthWeb Direct
ERP Hub
Gamelan
GoCertify.com
HTMLGoodies
Intranet Journal
IT Knowledge
IT Library
JavaGoodies
JARS
JavaScripts.com
open source IT
RoadCoders
Y2K Info

Previous Table of Contents Next


3.8. INTERACTIVE SYSTEMS AND SPOKEN LANGUAGE UNDERSTANDING (SLU)

"Spoken Language Understanding" is a subdomain of NLP that only recently has acquired a particular importance, both from an academic and an industrial point of view. It is based on the association of two technologies: speech recognition and natural language understanding.

I SpeechActs, for example, is a prototype testbed for building up spoken NL applications, developed at the Sun Microsystems Laboratories, Chelmsford, MA, see the paper by Paul Martin and colleagues in Special Section ... (1996) -- a continuous speech recognizer (i.e., a recognizer that accepts normally spoken speech with no artificial pauses between words) is coupled with a "traditional" NLP system, Swiftus, with a discourse manager, a text-to-speech manager, building tools, and third-party components relative to the particular application at hand (see Figure 22). This figure is a simplified version of the figure on page 35 of Special Section ... (1996). SpeechActs is independent of a particular speech technology, in that it can support several types of speech recognizers. Typical applications that can be developed using the SpeechActs tools are telephone-based applications where a business traveller dialogs in natural language with the system asking, after having entered his name and a password, for a series of services. He can ask the system, e.g., to read his e-mail making use of the text-to-speech facilities of SpeechActs (System: "Message 2 is from Mr. Brown"; User: "Let me hear it"), to consult his calendar (User: "What do I have the Friday after New Year's?"; System: "On Friday, January 3rd, you have no appointments"), hear weather forecasts, convert currency amounts, etc. To support a more natural dialog, SpeechActs also performs an, even though elementary, "discourse analysis" activity (see subsection 3.6). A system like this is, as already stated, a prototype; at the moment, there does not seem to exist an interactive SLU system really working in a field application, with the exception of the Philips train timetable system developed for Swiss Rail.

We will now examine briefly some of the blocks of Figure 22, trying to emphasize some particularities that the presence of speech and an interactive environment can introduce with respect to the "standard" paradigm for NLP (see subsection 3.1).

The flow of information is as follows. The speech recognizer analyzes the digitized audio data received from, e.g., the telephone via an audio server; when he considers that the user has completed an utterance, it sends a list of recognized words to the Swiftus module. The speech recognizer can only identify the words that appear in its lexicon, a specialized database containing all the words, with all their forms, that are needed for a given application. The speech recognition grammar is an artificial grammar used to reduce the word choice by imposing constraints on the permissible combinations of words. This grammar is then used to reduce "perplexity," a popular measure in the speech recognition domain, loosely defined as a measure of the size of the set of words from which the next word can be chosen. Perplexity depends on the domain of discourse: for example, it is divided by ten when we pass from general English to a very specific and relatively delimited domain such as, e.g., radiology. Swiftus parses the word list received by the speech recognition module using a sort of semantic grammar (see subsection 3.5.2.1). The semantic analysis performed by Swiftus is then, as the builders of the system say, "medium-grained," somewhere between coarse keyword matching and full, in-depth semantic analysis. The grammar is, of course, relative to a particular application of SpeechActs, and must be changed, as the relevant lexicon, when another application is activated -- an advantage of this approach is that the grammar can be exactly tuned to a specific application domain, and it is then possible to build up Swiftus grammars that are particularly robust.


FIGURE 22 SpeechActs, an example of Spoken Language Understanding.

The output of the parser is in the form of "feature-value pairs": for example, for the "business traveller" application mentioned before, the management of the calendar implies the production of pairs like: "USERID=jsmith," "DATE=3 January 1997," and "ACTION=appointment-lookup." However, the application module can work only if all the elements in the user's utterance, translated into Swiftus' pairs, are fully specified. If this is not the case, the application module can rely on the discourse module in order to obtain the missing information. Discourse management is performed making use of a stack where appropriate "discourse structures" are pushed, i.e., data structures specialized for dealing with particular situations, including functions capable of reacting to particular user inputs. For example, in the context of a dialog about conversion rates, the user can ask the system, "What's the rate for the franc." Given that the application does not have enough information to reply directly to this query, the corresponding Swiftus pair activates the discourse manager that pushes an appropriate discourse structure onto the stack. This then asks in turn the user for the type of franc (Belgian, French or Swiss) that must be considered: the answer "French" pushes a new discourse structure onto the stack that resolves "French" into "France." The application is now in possession of all the necessary information, and the disambiguation structures are popped.

An interesting feature of the system concerns the organization of the lexica and of the grammars. There is only a permanent lexicon, which is used by both Swiftus and the speech recognition module. Given that the speech module requires an explicit list of all the possible forms required by an application, the lexicon loader produces automatically all the forms like "shows," "showed," and "shown" from the permanent lexical entry concerning "show" before a concrete application really starts, i.e., morphological analysis is performed a priori and not on the fly. Analogously, a "Unified Grammar" is used for both recognition and parsing. The rules of the Unified Grammar consist, fundamentally, of a "pattern" (series of words) to be matched against the user's entry, and an "augmentation" component composed of "tests" that compare the features of particular pattern elements to one other or to constant elements, and of "actions" (e.g., lookup in the calendar) to be taken iff the pattern is matched. From this grammar, the Unified Grammar compiler produces both a grammar for a specific speech recognizer and a grammar for Swiftus. This last is a simple rearrangement of the rules of the Unified Grammar; the former requires a set of rewriting activities in order to reduce the original rules, which are relatively general in their expression, to sequences of terminal forms that, as already stated, are only used to constrain the word sequences produced by the speech recognizer in order to reduce the "perplexity."


Previous Table of Contents Next

footer nav
Use of this site is subject certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details.