Brought to you by EarthWeb
IT Library Logo

Click Here!
Click Here!

Search the site:
 
EXPERT SEARCH -----
Programming Languages
Databases
Security
Web Services
Network Services
Middleware
Components
Operating Systems
User Interfaces
Groupware & Collaboration
Content Management
Productivity Applications
Hardware
Fun & Games

EarthWeb Direct EarthWeb Direct Fatbrain Auctions Support Source Answers

EarthWeb sites
Crossnodes
Datamation
Developer.com
DICE
EarthWeb.com
EarthWeb Direct
ERP Hub
Gamelan
GoCertify.com
HTMLGoodies
Intranet Journal
IT Knowledge
IT Library
JavaGoodies
JARS
JavaScripts.com
open source IT
RoadCoders
Y2K Info

Previous Table of Contents Next


3. CHALLENGES AND SOLUTIONS

3.1. INTRODUCTION

The task of NLP is that of accepting inputs in a human, natural language (NL), and to transform the inputs into some sort of formal statements that are to be "meaningful" for a computer. The computer will be, therefore, able to react correctly to the given input; sometimes, the reaction will take the form of a NL "answer," i.e., the computer will use the formal representation corresponding to the analysis of the input to generate, in turn, statements in natural language.

If we follow the above approach, this means that, for us, NLP is characterized by the presence of some, very primitive and idiosyncratic indeed, form of "understanding" of the "meaning" of a given statement. As a consequence, we will exclude from the description of the NLP domain some trivial and purely passive forms of processing of NL inputs. Examples are the simple transfer on magnetic support of a spoken input through the use of a voice recorder, or the handling of inputs formed by single words, e.g., all sort of commands, entered by a keyboard or spoken through a voice recognition system. We consider, in fact, that a real problem of "meaning" begins only when several words combine together inside a written string or an utterance.

We can also remark that, in the sentences above, we have formulated no hypothesis about the nature of the input, written or typed string of words, or utterances spoken into a microphone. Likewise, no hypothesis has been formulated on the eventual function of the formal "meaning" to be extracted from the input: it can be used directly to feed a knowledge base, be transformed into a SQL query for a relational database, be entered into a transfer structure to produce an output in a language different from that of the input, etc. In all these cases, in fact, the sequence of operations to be executed on the input is virtually the same, from the segmentation into words and the recognition of their grammatical category (morphological analysis), the organization of the words in a structure reflecting their functional (syntactic analysis) and semantic (semantic analysis) relationships, the resolution of the ambiguities and the grouping in wider structures (discourse analysis), the use of the final, formal representation to perform a particular task (pragmatic analysis), etc.

In reality, this sort of "classical" paradigm has been set up mainly for the needs of the analysis, offline, of written texts of a certain length; it must be inflected in a certain number of cases, for Spoken Language Understanding (SLU) for example, or for interactive systems, and especially for dialog systems with spoken input. Whereas interactive systems in general are characterized, e.g., by sentences that are shorter than those normally encountered in written texts ("Who is the manager of the R&D department?"), and by the strong presence of incompleteness phenomena, mainly pronominal anaphora ("What is his salary?") and ellipsis ("His phone number?"), the spoken dialog systems present an additional number of deviations from the standard patterns. Acoustic ambiguities can affect the recognition of input words; in spoken dialogs, we do not deal with "sentences" according to the ordinary meaning of this word, but with fragmentary utterances deprived of the ordinary punctuation marks that allow us to delimit sentences and clauses; there is a considerable noise due to hesitations, repetitions, false starts, spurious words; etc. This means that, in this sort of system, the classical sequence of steps is reduced and the analysis simplified. Moreover, of course, specific procedures must be introduced to deal with the spoken input, etc.

Our analysis of the technical issues, in the following, will be based mainly on the standard approach -- which, anyway, still represents the main paradigm for NLP -- its goals, its problems, and the solutions found. In subection 3.8 and Section 4, we will supply, however, some information on Spoken Language Understanding systems and an example -- please note that SLU could become particularly important in an expert systems framework, where the expectations for effective NL interfaces are very high.

3.2. THE STANDARD PARADIGM FOR NLP

We can decompose this paradigm according to the sequence:

  • Morphological analysis
  • Syntactic analysis
  • Semantic analysis
  • Discourse and pragmatic analysis

We will now examine each of these steps; moreover, we will add a specific subsection (3.7) to describe briefly the generation procedures. Please note that, of course, this sort of division is more a didactic artifice than a description of the actual organization of some concrete systems. In fact, several systems combine at least two of these phases into a single step, e.g., syntactic and semantic analysis -- when, as in the conceptual analysers of Schankian inspiration, the analysis is, mainly, semantically driven -- or the semantic and the discourse/pragmatic phases. Some authors prefer to split the morphological phase into a morphological phase proper and a phase of lexicon's inspection, etc. A well-known description of the "classical" paradigm can be found in Allen (1987).


Previous Table of Contents Next

footer nav
Use of this site is subject certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details.