Brought to you by EarthWeb
IT Library Logo

Click Here!
Click Here!

Search the site:
 
EXPERT SEARCH -----
Programming Languages
Databases
Security
Web Services
Network Services
Middleware
Components
Operating Systems
User Interfaces
Groupware & Collaboration
Content Management
Productivity Applications
Hardware
Fun & Games

EarthWeb Direct EarthWeb Direct Fatbrain Auctions Support Source Answers

EarthWeb sites
Crossnodes
Datamation
Developer.com
DICE
EarthWeb.com
EarthWeb Direct
ERP Hub
Gamelan
GoCertify.com
HTMLGoodies
Intranet Journal
IT Knowledge
IT Library
JavaGoodies
JARS
JavaScripts.com
open source IT
RoadCoders
Y2K Info

Previous Table of Contents Next


3.4.3. Technical Issues

Independently now from the model of formal grammar used, there are some general, technical issues that concern the optimal way of making use of this grammar, i.e., of matching the input strings of words against the patterns of the grammar. For example, some typical choices that the designer of a parser must face are (1) top-down versus bottom-up processing, (2) backtracking versus parallel parsing (or the use of a deterministic parser, see below), (3) unidirectional processing or island driving processing, etc.

In (1), top-down processing means looking first for the rules that can produce a certain type of top-level structure (e.g., a certain type of sentence), then looking for the rules that can give rise to the second-level constituents of these top-level structures, and then proceeding in this way until a complete structure for the sentence has been built up. If this is not possible, the top-down parser normally backtracks, generating another type of top-level structure. In the bottom-up strategy, parsing starts from the words of the input sentence, trying to apply first the rules that can combine these words into larger constituents, and then combining in turn these constituents to show how all the words of the sentence can be inserted into a general, top-level sentence of the grammar. Independently from the strategy chosen, a common problem encountered concerns the fact that the elements of natural language do not always have unique meanings; see the example already mentioned of "rose" that can be a noun, a verb, or an adjective: this can introduce an alternative between the instantiation of a noun phrase or a verb phrase, or an ambiguity in the structure of the noun phrase, etc. These sorts of ambiguities compel the parser to choose among multiple options when proceeding through a sentence; the strategies mentioned in (2) concern dealing with all the alternatives at the same time (parallel processing), or dealing with one alternative at the time using, in case of failure, a form of "backtracking" to come back to the choice point and choosing another option. Both of these methods require a considerable amount of bookkeeping to deal with the multiple choices problem: in the parallel case, to take account of all the possibilities being tried; in the backtracking case, to take account of all the possibilities not yet tried. Please note that, in the case of backtracking, the number of alternatives that must effectively be considered can be considerably reduced by making use of "heuristic knowledge" about their likelihood in a given situation. Finally, the design option mentioned in (3) concerns the possibility of proceeding systematically in one direction with the parsing, normally from left to right, or to start anywhere and looking first at the neighboring words in order to build up structured constituents of increasing size (island driving). This second approach has been used often in the speech understanding system as a means to deal with input uncertainty.

A way to escape from the necessity of choosing, see (2), between parallel parsing and backtracking is given by the so-called "deterministic parsers" -- popularized by the work of Mitchell Marcus at Bell Laboratories in New Jersey, and having a wide popularity in the 1980s. The basic principles of a deterministic parse are that of (1) trying to build up syntactic structures only when they are presumably correct; (2) never building up syntactic structures that are not used in the complete parse; and (3) never destroying any structure that has already been built. From a concrete point of view, Marcus' parser is structured around a three-cell buffer in which are stored three consecutive "constituents" of the syntactic structure that is being built up; the constituents can range from words to complete clauses. In each phase of the procedure, only these three constituents are taken into account. Parsing proceeds by matching, making use of a rule-based grammar, the constituents found in the buffer to the requirements of the syntactic structure already set up and stored in a "parse stack"; the parse stack evidences then the current state of the parse operations.

Let us consider, for example, the sentence "the boy eats the apple," and let us suppose that the first NP ("the boy") has already been parsed. We are now in a situation where (1) the parse stack contains only, at the top level, a "S(entence)" syntactic structure having this last NP as its SUBJ(ect); and (2) the three cells of the buffer contain, respectively, the words "eats," "the," and "apple." The state of the stack, and the grammar rules, requires the attachment of the main verb to the S structure; the lexical entry for "eats" is then removed from the buffer and inserted in S as the main verb. At the same time, the grammar requires the creation of a second, empty NP (which will be used to parse "the apple") that is pushed onto the stack. This second NP will be then constructed by making use of the words "the" and "apple" of the buffer. After this sequence of operations, the buffer will then be empty and the parse buffer will contain two structures, the S structure corresponding to "the boy eats" and, on top of this, the NP structure for "the apple." This last structure is now popped and pushed back onto the buffer, leaving again S as the top-level structure of the parse stack. The state of the parse stack, and the grammar rules, now require one to build the final structure by taking the NP for "the apple" from the buffer and inserting it as an OBJ(ect) in the S structure.

Deterministic parsers, where backtracking or parallel processing are eliminated, are evidently computationally very efficient; from a linguistic point of view, they cannot however provide a syntactically correct parse when the parse depends on looking ahead for more than three constituents. A situation like this is encountered, e.g., in parsing the so-called "garden-path" sentences, where we are led first toward a given interpretation by the first words in the sentence, but we discover in the following that this interpretation is a dead end, and we are obliged to retrace our steps to find a new one. A well-known example of garden-path sentence is "The horse raced past the barn fell." Marcus' deterministic theory has then evolved into the so-called "D-theory" (Description theory). According to this theory, a deterministic parser that looks ahead for three elements in the sentence must be used not to set up a complete tree structure as the result of the parse, but rather to construct a simplified "description" of the syntactic structure of the sentence. In this way, e.g., syntagmes introduced by a preposition like "from" or "by" (prepositional phrases, PPs) will not be attached precisely into the parse tree, but only simply described as "dominated" by a verb phrase node (see also the next subsection).


FIGURE 9A A simple RTN.


Previous Table of Contents Next

footer nav
Use of this site is subject certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details.