Brought to you by EarthWeb
IT Library Logo

Click Here!
Click Here!

Search the site:
 
EXPERT SEARCH -----
Programming Languages
Databases
Security
Web Services
Network Services
Middleware
Components
Operating Systems
User Interfaces
Groupware & Collaboration
Content Management
Productivity Applications
Hardware
Fun & Games

EarthWeb Direct EarthWeb Direct Fatbrain Auctions Support Source Answers

EarthWeb sites
Crossnodes
Datamation
Developer.com
DICE
EarthWeb.com
EarthWeb Direct
ERP Hub
Gamelan
GoCertify.com
HTMLGoodies
Intranet Journal
IT Knowledge
IT Library
JavaGoodies
JARS
JavaScripts.com
open source IT
RoadCoders
Y2K Info

Previous Table of Contents Next


3.7. LANGUAGE GENERATION

The Natural Language Generation (NLG) domain concerns the construction of computer programs able to produce a (high-quality) NL text from the computer-internal representation of given information. Motivations for working in this domain range from strongly theoretical reasons, which concern mainly research in the linguistic and psycholinguistic domains, to very practical ones that relate to the concrete production of understandable outputs for all sorts of computer programs. In this last context, we can mention here the following, practical applications of NLG techniques: display of database content, expert system explanation, speech generation, generator systems for producing standardized multiparagraph texts like business letters or monthly reports, multimedia presentation, automated production of summaries, etc.

In spite of these very exciting, practical and theoretical opportunities, NLG has received, until recently, substantially less attention from computational linguists and computer scientists than the "classical" problems concerning NL analysis. For example, most expert systems are equipped, still, with very simple generation modules in the "canned texts" or "template" style (see below). This situation is now changing, and the international NLG community is rapidly growing, even if generation is still perceived as a quite marginal domain in the global NLP enterprise.

It is normally assumed that every NLG system can always be decomposed into two main components:

  • A "planning component," which selects and organizes the information to be included in the output, and produces accordingly, in some internal format, an expression representing the content of the proposed statement(s)
  • A "realization component," which converts sentence-sized fragments pertaining to the previous expression into grammatically correct sentences

In the real system, this type of decomposition is, sometimes, really difficult to operate. For example, in the "canned text systems," the less sophisticated type of NLG systems, "planning" and "realization" practically coincide, given that these systems simply print out predefined strings of words without any change. The canned text systems are commonly used, e.g., for producing the error messages in all sorts of software components; they can also be used to produce warnings, letters, etc. Slightly more sophisticated techniques in the same style are those used, e.g., in the ELIZA system evoked, before, in subsection 2.2. In ELIZA, in fact, an at least embryonic form of planning component exists, given that the user's input was reused for constructing parts of the output. The systems in the next level of sophistication, the "template systems" -- where the basic technique consists in filling predetermined patterns with the results of a given computer run -- are used whenever a general type of message must be reproduced several times with some slight modifications. When the type of texts to be generated are fairly regular in structure, like some types of business reports, it is possible to use the template technique for the generation of paragraphs including multiple sentences. Given the predefined format of the texts to be output, in the template system the planning component is a very rudimentary one.

A first, concrete differentiation between "planning" and "realization" can be found in the most advanced realizations of the template technique, like McKeown's TEXT (1985), where it was possible to dynamically combine instances of four standard paragraph models, called "schemas," to create multisentence texts. The technique that is still the most widely used when the (mono- or multisentence) text to be generated must be a "flexible" and "well-written" one is the "phrase-based" approach, that can be seen as a generalization of the template approach. In the phrase-based technique, a phrasal pattern is associated with each main "type" that characterizes the internal representation to be converted into NL. For example, phrasal patterns can be associated with conceptual elements like "predicate" and "slots fillers" in a semantic network (or conceptual graph, NKRL etc.) type of representation, or with syntactic categories in a syntactic tree. Starting then from the phrasal pattern that matches the top level of internal representation -- e.g., <SUBJ VERB OBJ> in the case of a syntactic-tree -- each part of the pattern is expanded in turn into more specific phrasal patterns, like <DET ADJ NOUN MODIF>, that match particular subparts of the internal representation. This splitting process continues until all the phrasal patterns have been replaced by one or more words. For example, for a semantic network or frame representation having BUY has the top predicate, the first phrasal pattern selected, associated with BUY, can ask to search for the AGENT, the TENSE, the OBJECT, and the FROM-POSS(essor) cases; a pattern associated with AGENT will then ask to find the NL equivalent, e.g., "Bill," of the symbolic filler BILL; the phrasal pattern associated with TENSE will use the filler of this slot, e.g., PRESENT, to generate "buys" and, therefore, "Bill buys," etc. Used first for the generation of single sentences, the phrase-based systems were adapted in the late 1980s to the generation of multisentences texts. In this last case, they are often called "text planners," where each "plan" is a sort of phrasal pattern that specifies the structure of a portion of the whole discourse, and which is successively split in more specific plans until the level of the single clause is attained.

From the point of view of the concrete realizations, the advantages of text generation over text analysis are linked with the fact that, for the former, it is normally always possible to select a priori the minimal level of complexity of the NL output to be generated that can guarantee a correct understanding of the internal information. This, obviously, is not possible when analyzing a text, whose level of complexity is, at least in principle, totally unknown at the start. However, when the option of attaining a high level of linguistic refinement of the output is chosen, the NLG problem becomes at least as difficult as the corresponding analysis problem. Among these difficulties, we can mention: (1) those linked with the lexical selection, as soon as the internal representation approaches a realistic level of complexity, and the lexicon of the generation module is large enough to permit alternative locutions -- an (apparently simple) problem of this type, which involves both the "planning" and "realization" components, concerns the use of pronouns and possessive adjectives and pronouns, see, e.g., "John's car" vs. "his car"; (2) in general, the difficulties linked with the selection of the appropriate granularity for the presentation of the internal information -- this last can be, in fact, packaged into different units depending on several contingencies like the structure of the original information, the purpose of the output text, the expected audience, the stylistic biases, etc.; (3) the mismatch between some rudimentary forms of knowledge representation used in the original information -- e.g., the tabular model of relational databases -- and the (sometimes really advanced) linguistic features required in the output; and (4) the difficulties in designing realistic text planners, capable of generating texts with several paragraphs.

If we should want to condense into a few words the actual "state of the art" in the NLG field, we could say that it is now possible to produce commercially effective, general-purpose NL generators for single sentences. Partial and limited solutions exist at the moment, on the contrary, for the generation of well-formed, multisentence texts and paragraphs.


Previous Table of Contents Next

footer nav
Use of this site is subject certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details.