![]() |
|||
![]()
|
![]() |
![]() |
![]() |
3.4. SYNTACTIC ANALYSISEven if the words of the input are now endowed with a "tag" (part of speech category), they still represent no more than a simple, flat string of words. To begin understanding the "meaning" carried on by the statement, it is necessary to make clear first its "structure," i.e., to determine, e.g., the subject and the object of each verb, to understand what modifying words modify what other words, what words are of primary importance, etc. Assigning such structure to an input statement is called "syntactic analysis" or "parsing"; a computer procedure that executes this sort of task is called a "parser." A parser is then a sort of transducer that takes as input a flat string of words (already linked with their part of speech tags) and produces as output a structure (e.g., a "syntactic tree") where the mutual relationships of the words in the statement are indicated. 3.4.1. Formal Grammars In the usual approach to parsing, each parser is based, implicitly or explicitly, on a "grammar," i.e., on a set of rules that, at least in principle, describe or predict all and only the possible sentences in a given NL; this implies that, being in possession of the ideal, perfect grammar for a given language, we can classify all the statements presented as input for the corresponding parser into the legal ones (the "grammatical" ones), for which the parse is successful and a complete structure can be determined, and the illegal, ungrammatical ones for which the parser is unable to find a complete structure. Examples of rules ("productions") that can be found in a grammar for English are the classical "S > NP VP," stating that "a sentence consists of a noun phrase followed by a verb phrase," or "NP > ADJ+N," stating that "a noun phrase may contain a sequence of adjectives followed by a noun." We can remark here, however, that a new approach to parsing is beginning to appear in the research literature, where it is sometimes called "data-oriented processing" (DOP) parsing. According to the DOP approach, parsing (and NLP in general) must not be based on a nonredundant set of formal rules, but on a probabilistic process that parses new inputs by analogy, making use of a corpus of representations of past language experiences. These are represented as strings of words associated with their syntactic structures, their contexts, and their meanings.
In spite of the considerations of the last paragraph, "formal grammars" are an still a very active domain of investigation in linguistics; a first example of formal grammar -- in this case, a "phrase-structure grammar" -- can be defined as in Table 2. Phrase-structure grammars have been classified by Chomsky into four different types (four levels) that depend on the form of the strings [alpha] and [beta], i.e., on what elements of VN and VT can appear in [alpha] and [beta]. Levels go from level 0, where no constraints are imposed on the form of [alpha] and [beta], to level 3, where productions are severely constrained and, as a consequence, the descriptive power of the corresponding grammars is very poor. The two productions for English shown in the previous paragraph are part of a possible "context-free grammar"; this type of grammar pertains to level 2. In context-free grammars, productions are of the form "A > [alpha]," where A is a (unique) symbol pertaining to VN, and [alpha] is a string of terminals and/or nonterminals. No constraint is, therefore, imposed on the form of the right side of the production. The name "context-free" takes origin from the fact that no sequence of surrounding symbols, equivalent then to the presence of a required "context," is associated with A in the left side of the production; the simple appearance of A is sufficient to trigger the production. "Context-sensitive grammars" pertain to level 1. The phrase-structure model of formal grammars is only partly well-suited to the description of English. For example, a "distributive" structure like that introduced by the adverb "respectively," "Mary, Jane, and Elaine are the wives of Tom, Arthur, and Charles, respectively" cannot be dealt with properly (cannot be generated) by level 3 and 2 grammars; on the other hand, context-sensitive grammars (level 1) do not represent, in practice, a better solution given that they give rise, among other things, to descriptions of English that are unnecessarily awkward and complex. This sort of reasoning led Chomsky to propose, in 1957 and 1965, a new formal model called "transformational grammars." This model consists, in short, of a "base component" and a "transformational component." The base component consists of a context-free grammar, see before, that produces a set of "deep structure" trees. The transformational component is a set of tree-rewriting rules that, when applied to a particular deep structure tree, gives rise to one or more possible "surface structure" trees. The sequences of terminal nodes (words) of the surface structure trees are the sentences of the language; their common "meaning" is represented by the deep structure tree. The theory postulates, then, that the application of transformational rules to deep structures must preserve meaning, as in the case of simple alternation of active and passive constructions. Other formal models of grammars that have a more "semantic" flavor than those discussed previously, are, e.g., "systemic grammars," developed by Michael Halliday and colleagues at the University of London in the early 1960s, and "case grammars," proposed by Fillmore in 1968 and particularly popular in an Artificial Intelligence context -- Schankian conceptual dependency representation can be considered as variants of case grammar representations, etc.
|
![]() |
|
Use of this site is subject certain Terms & Conditions. Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details. |