Brought to you by EarthWeb
IT Library Logo

Click Here!
Click Here!

Search the site:
 
EXPERT SEARCH -----
Programming Languages
Databases
Security
Web Services
Network Services
Middleware
Components
Operating Systems
User Interfaces
Groupware & Collaboration
Content Management
Productivity Applications
Hardware
Fun & Games

EarthWeb Direct EarthWeb Direct Fatbrain Auctions Support Source Answers

EarthWeb sites
Crossnodes
Datamation
Developer.com
DICE
EarthWeb.com
EarthWeb Direct
ERP Hub
Gamelan
GoCertify.com
HTMLGoodies
Intranet Journal
IT Knowledge
IT Library
JavaGoodies
JARS
JavaScripts.com
open source IT
RoadCoders
Y2K Info

Previous Table of Contents Next


In general, even if SLU technology has progressed notably in recent years, a considerable amount of work is still necessary in order to obtain systems that, at the difference of the actual ones, could be portable and not limited to specific application domains. For example, two problems that are still waiting for effective, industrial solutions are those (1) of the concrete utilization of the "prosodic" information, and (2) of the handling of the "disfluencies" (typical phenomena of the spontaneous speech like interjections, repeated words, repairs, false starts, etc.). Prosody refers to acoustic phenomena like stress, intonation, and rhythm (phrase breaks), that all convey important information about syntactic structures, discourse, and word recognition; prosody can also provide information about emotion and attitude through the user's intonation (e.g., sarcasm, anger).

Passing now to the traditional written input, we can note immediately that, in spite of more than 3 decades of research effort, we still cannot name a commercially usable parser that is, actually, both domain-independent and capable of dealing with unrestricted texts. The main technical problems concern (1) the correct segmentation of the input text into syntactically parsable inputs (this problem is particularly evident when the text sentences contain "text adjuncts" delimited by dashes, brackets or commas); (2) the resolution of the syntactic ambiguities (see subsection 3.4.4); (3) the cases where the input is outside the lexical and/or syntactic coverage of the system (see the problem of the "unknown words" in subsection 3.3).

Faced with this situation, and confronted with the exigency of having at its disposal "robust" tools to meet the increasing need for NLP applications, the NLP community is making use more and more of "shallow parsing" techniques; we have already encountered, in subsection 3.5.1, a first form of shallow parsing, consisting of the use of an underspecified form of representation for the output of the syntactic analyzers. This form of parsing is normally used when the application requires the analysis of large text corpora: in this case, in fact, because of the problems outlined in the previous paragraph, the use of the traditional, "deep" syntactic parsers can be both arduous and cumbersome. In general, shallow parsing is characterized by the fact that its output is not the usual phrase-structure tree, but a (much) less detailed form of syntactic analysis where, normally, only some phrasal constituents are recognized. These can consist, e.g., of noun phrases -- without, however, the indication of their internal structure and of their function in the sentence -- or of the main verb accompanied by its direct arguments. Inspired by the success of stochastic methods like HMM in the field of speech understanding, the builders of modern shallow parsers often make use of probabilistic methods that are tested and refined, e.g., on the basis of reference corpora composed of sets of manually bracketed sentences.

With respect now to the application side of NLP techniques, we will limit ourselves to mentioning two classes of applications that are becoming increasingly popular. A domain particularly important in the context of the present "full text revolution" -- and which is also important for the development of efficient ESs/KBSs, for the same reasons we have already examined when speaking about the extraction of information from NL texts (see subsection 3.5.2.2) -- is that of "summarization" (automatic abstracting). In the past, summarisation has been executed mainly by selecting "relevant" sentences in the input text through the use of statistical cues or of keywords. Today, under the influence of the MUC results, there is a tendency toward coupling pure statistical methods with more analytical techniques, like those used in the past in systems such as FRUMP and SCISOR (see Section 2.3). A second, "new" NLP domain that is gaining every day in importance, and that is at least partly related to domains like information extraction and summarization, is that of the so-called "controlled languages." These languages have been developed in an industrial framework to counter the tendencies of technical writers to freely make use of special vocabularies (jargons), of proper styles, and of unusual grammatical constructions. A controlled language (CL) is then a subset of a particular NL where some restrictions have been imposed on the use of the lexicon (only the authorized words can be used), of the grammar/syntax (e.g., the number of words that can be inserted in a NP group is limited, and the active voice is privileged with respect to the passive voice), of the semantics (by normalizing the expression of the agent, the action, the goal, the instrument, etc.), and of the style (e.g., the length of the sentences can be limited to no more than 20 words). The best known example of CL is the AECMA Simplified English, which has been adopted by the entire aerospace industry to write aircraft maintenance documentation. From an NLP point of view, the interest of CLs resides, evidently, in the conspicuous reduction of the difficulties proper to the syntactic/semantic analysis, which renders computationally conceivable particularly complex applications like mechanical translation and summarization.


Previous Table of Contents Next

footer nav
Use of this site is subject certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details.