Brought to you by EarthWeb
IT Library Logo

Click Here!
Click Here!

Search the site:
 
EXPERT SEARCH -----
Programming Languages
Databases
Security
Web Services
Network Services
Middleware
Components
Operating Systems
User Interfaces
Groupware & Collaboration
Content Management
Productivity Applications
Hardware
Fun & Games

EarthWeb Direct EarthWeb Direct Fatbrain Auctions Support Source Answers

EarthWeb sites
Crossnodes
Datamation
Developer.com
DICE
EarthWeb.com
EarthWeb Direct
ERP Hub
Gamelan
GoCertify.com
HTMLGoodies
Intranet Journal
IT Knowledge
IT Library
JavaGoodies
JARS
JavaScripts.com
open source IT
RoadCoders
Y2K Info

Previous Table of Contents Next


3.5.1. Use of Semantic Procedures to Supplement the Syntactic Analysis

Several NLP tasks of practical relevance (e.g., message routing, information retrieval, categorization of textual documents) can be executed, at least to some extent, using techniques like statistical analysis, keywords or pattern matching, that do not imply any reference to the "meaning" of the textual matter with which they are dealing. This is not true for many other applications, even if they do not attempt to produce any sort of deep semantic content analysis. It is evident, e.g., that NL interfaces to DBs must be able to tackle at least some limited form of ellipsis: this cannot be done without inspecting, in some way, the semantic class to which the elliptical fragment pertain, and its congruence with the semantic class of the most significant elements of the previous, complete sentences (see subsection 3.4.4.2).

An important class of application of semantic methods concerns, therefore, supplementing the traditional morphosyntactic analysis in order to obtain more reliable and complete results. It is now impossible to give a full description of all the possible applications of semantic techniques in a syntactic context (which, very often, are very limited and ad hoc). We give here only an example, a (relatively general) strategy for making use of semantic rules in a context of PP-attachment resolution (see subsection 3.4.4.1). This semantic application has been realized in the context of a project, COBALT, partially financed by the European Community. COBALT has, in reality, very ambitious objectives with respect to the representation of the "meaning" of its basic linguistic material, Reuter's news in the financial domain, that it encodes according to the format of a high-level representation language, NKRL (see subsection 3.5.3.3). However, the basic PP-attachment strategy of COBALT can be very well described independently from the main semantic procedures; moreover, this example will allow us to introduce a very actual concept in the NLP practice, that of "underspecified representation."

An underspecified, syntactic representation is a type of representation where, in the presence of potential ambiguities, no effort is put forth in order to represent all the possible solutions that could be proposed to solve these ambiguities. We can then avoid any danger of combinatorial explosion, which is particularly likely when the number of words in a statement exceeds 25. The typical example of underspecified representation is given by the so-called "quasilogical form" (QLF form), a sort of formal representation allowing the reduction of logical formulas of higher order to a conjunction of first-order terms formed by a predicate and its arguments. This is possible thanks to the introduction of "event predicates," see below, or "event variables": these last, e.g., act as indexes that allow us to identify, for each predicate, the event to which this predicate refers. To give a first, very simple example, the sentence "Mary writes the paper quickly," which we could represent in a traditional way such as: "quickly(writes(Mary, paper))," is now coded as: "writes(w) & agent (w, m) & object(w, a) & paper (a) & quickly (w)," where w is the event variable and "m, a" are constants. We can paraphrase this expression as: there exists an event w, such as w concerns a writing event, and the agent of this event is "m" (Mary), and the object of this event is "a," and "a" is an article, and the event is progressing quickly. This type of representation presents all sorts of logical and computational advantages; we will mention here the fact that, using this sort of analysis, it is possible to build up very robust syntactic parsers producing always a single QLF analysis of an input sentence, thanks to a technique that consists of: (1) avoiding having to choose in the presence of syntactic ambiguities like the prepositional groups attachment; (2) accepting as final syntactic analysis a set of nonoverlapping QLF parse fragments that span the input sentence. The global coherence must then be reconstructed later, at the semantic level.

For example, we reproduce in Figure 10 the QLF image produced by the COBALT's parser for a fragment like: "Sharp Corporation said it has shifted production of low value personal computers from Japan to companies in Taiwan and Korea." No effort has been made, at this level, to solve the PP-attachment ambiguities linked with the presence of "of," "from," "to," and "in." In Figure 10, _e1 and _e2 are "event predicates"; their arguments (arg0, arg1, etc.) are in turn two-place predicates where the first argument refers to the event whose argument they are, and the second argument refers to the participant itself. All the prepositions are translated into three-place predicates. The first argument is the preposition relation, and the other arguments are the indices of the entities for which the relation is valid. Ambiguity in the PP-attachment is expressed in QLF by leaving the second argument of the predicate unspecified (see again Figure 10).

say (_e1), arg0 (_e1, _sh), arg1(_e1, _e2),
sharp_corp (_sh),
shift (_e2), arg0 (_e2, _it), arg1 (_e2, _p), it (_it), production (_p),
of (_of, _, _nn), nn (_nn, _v, _c),
low (_l, _v), value (_v), personal (_p, _c), computer (_c),
from (_fr, _, _j), japan (_j),
to (_to, _, _co), company (_co),
in (_in, _, _and1), and (_and1, _t, _k), taiwan (_t), korea (_k)

FIGURE 10 QLF representation of the "Sharp Corporation" fragment.

The COBALT parser then converts the QLF representation into some sort of "standard" tree representation; in our case, given the presence of the unspecified arguments, we obtain the four syntactic subtrees of Figure 11, i.e., a "forest" instead of a complete tree structure. The four subtrees correspond to the presence of the four prepositions in the original fragment; the first ambiguity, which concerns "of," can be solved making use of the "postmodifiers rule" mentioned in subsection 3.4.4.1; its use results in the reattachment of the PP subtree introduced by "of," see Figure 11, to the tree corresponding to the beginning of the sentence. The residual ambiguities cannot be solved by using tools based only on syntactic factors, and require the use of the semantic disambiguation module of COBALT.

We will only say here that the "attachment algorithm" proper to this module makes use of a strategy based on two main principles:

  • The use of a "generate and test" procedure, allowing us to construct all the possible semantic interpretations ("readings") of the textual fragment examined; more specifically, a reading is a set of output structures, expressed in the conceptual language (NKRL) used in COBALT, that corresponds to an identical semantic interpretation of the fragment.
  • The use, for each possible ambiguous preposition, of a set of "attachment rules" having the format: "syntactic condition (antecedent) -- operations concerning the final, NKRL representation (consequent)," which are used to reduce to a minimum the number of readings. In each rule, the consequent specifies the mandatory conditions under which the term(s) corresponding to the head noun of an ambiguous PP (prepositional phrase) can fill a particular position in one or more of the conceptual (NKRL) structures produced by the general semantic analyzer of COBALT.

We reproduce in Table 3 a simplified version of some of the "attachment rules" associated with three of the propositions of the "Sharp Corporation" fragment: "from" ("from Japan"), "to" ("to companies"), and "in" ("in Taiwan and Korea"). The antecedent parts refer to the results of the syntactic analysis, i.e., to the forest produced by the parser. SUBJ(ect), OBJ(ect), SOURCE, and DEST(ination) are conceptual roles proper to the NKRL language (see subsection 3.5.3.3). The SOURCE role refers to the animate entity (group of entities) that is responsible for the particular behavior of the SUBJ(ect) mentioned in an NKRL output structure; the DEST(ination) role -- sometimes called "benefactive" in other knowledge representation systems -- refers to the "addressee" of the activity of the SUBJ(ect). Please note the lack of precision of rule c), owing to the very ambiguous character of the preposition "in."

(s
 (np sharp_corp)
 (vp
  (vtrans said)
  (s
   (np it)
   (vp
    (vtrans shifted)
    (np production_1))))

(pp
 (prep of)
 (np low_value_added_pc_1))

(pp
 (prep from)
 (np japan_))

(pp
 (prep to)
 (np company_1))

(pp
 (prep in)
 (np
  (conj and)
  (taiwan_ korea_))))

FIGURE 11 A "forest" corresponding to the QLF analysis of the "Sharp Corporation" fragment.


Previous Table of Contents Next

footer nav
Use of this site is subject certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details.