Brought to you by EarthWeb
IT Library Logo

Click Here!
Click Here!

Search the site:
 
EXPERT SEARCH -----
Programming Languages
Databases
Security
Web Services
Network Services
Middleware
Components
Operating Systems
User Interfaces
Groupware & Collaboration
Content Management
Productivity Applications
Hardware
Fun & Games

EarthWeb Direct EarthWeb Direct Fatbrain Auctions Support Source Answers

EarthWeb sites
Crossnodes
Datamation
Developer.com
DICE
EarthWeb.com
EarthWeb Direct
ERP Hub
Gamelan
GoCertify.com
HTMLGoodies
Intranet Journal
IT Knowledge
IT Library
JavaGoodies
JARS
JavaScripts.com
open source IT
RoadCoders
Y2K Info

Previous Table of Contents Next


Marcus' work was inspired by the desire of freeing NLP from the need of executing the laborious backtracking operations characteristic of the ATN parsers, like the parser used by William Wood in his famous LUNAR system. The ATN approach has formed the bulk of the operational syntactic parser all over the world for a long period, until at least the end of the 1980s. Even if they are now supplanted by the more fashionable unification parsers, they deserve at least a short mention here.

ATN (Augmented Transition Networks) evolved from simpler, Recursive Transition Networks (RTN). An RTN, see Figure 9, is a network of nodes, called "states," which are connected by arcs; one node is identified as the "initial state," and one or more nodes as the "final state." Some arcs, called CAT arcs, are labeled with lexical categories, like N (noun), V (verb), DET (determiner), ADJ (adjective), or ADV (adverb).

An RTN works according to the following principles. When presented with an input sentence, and starting at the initial state node of the RTN with the initial word of the sentence, it is possible to traverse an arc if the current word is in the category shown on the arc. If this arc can be followed, the current word is "consumed," and the input is updated to the next word. If the RTN comes to the final state and the entire sentence has been consumed -- possibly after one or more "backtrack" operations (the RTN returns to the last choice point) if there is an incompatibility between the arc the RTN would follow and the category of the word under examination -- then the sentence is accepted as "grammatical" with respect to the grammar that the RTN implicitly represents. Particular types of arcs are the PUSH and POP arcs; see again Figure 9. PUSH arcs, labeled with constituents like NP (noun phrase) and VP (verb phrase), are used to transfer the control to a subnetwork that is in charge of analyzing, e.g., a noun group or a verb group; a POP arc is used then to come back to the main network. JUMP arcs are also allowed; these arcs transfer the control to a new state without consuming any input word.

For example, when analyzing the noun phrase "the boy," see Figure 9B, and starting with the initial state node NP, it will be possible to pass to the node NP1 by following the arc labeled DET, given that the first word is "the." A JUMP arc will allow to direct passage to NP1 in the absence of the article, like in "John" in place of "the boy"; the ADJ arc accounts for the presence of one or more optional adjectives ("the big boy") in the NP group. Control is then transferred to the node NP2 following the arc N and using the noun "boy." At this point, we have reached a POP arc, and this means that "the boy" is a legal noun phrase. As a second example of the utility of a JUMP arc, please consider the main, simple RTN of Figure 9A: this network can account for an imperative sentence (without initial NP), like "Eat quickly!," where the node S1 must be reached directly without consuming any word of the input sentence.


FIGURE 9B The NP subnetwork included in the previous RTN.

This type of tool is affected by two major problems. The first concerns the fact that the grammars implemented by the RTNs are equivalent to context-free grammars, which are notoriously insufficient, as we have already seen, for an accurate analysis of NL. Moreover, a parser that tells simply if a sentence is or is not a grammatical one, without showing the analysis of the sentence, is not really interesting. The two problems have been solved by Wood by "augmenting" the RTN arcs with actions, giving rise to "Augmented Transition Networks," ATNs. Actions are fundamentally of two types, in order to reply to the two sorts of objections arising against RTNs. To go beyond the strict context-free framework, ATN allows one to implement complex "tests" on any arc. Tests are arbitrary pieces of code that act as predicates, and allows one to perform checks that would be unmanageable to implement using only the lexical category mechanism. For example, a CAT arc that accounts for the main verb of a sentence may be "augmented" with a test that checks if the verbs agrees in person and number with the subject of the sentence. Second, in order to document the progression of the parse operations, each ATN is supplied with a set of registers, like DET, ADJS, HEAD, and NUM, for an NP subnetwork; actions, in this case, consist typically of setting a register to a certain value or getting a value from a register. Registers are reset to their original values upon backtracking; moreover, when a PUSH occurs, the current contents of the registers are saved and the called subnetworks get a fresh set of empty registers. When a POP arc is followed, all the registers that have been set in the current network are compiled in a structure formed by the name of the network, followed by a list of the registers with their values: the parsing operations executed in the network are now documented.

Compiling ATN into LISP code is very easy, and the ATN mechanism can be represented in its totality by making use of very few instructions: using in fact the normal function-calling mechanism of LISP to represent state transitions, backtracking is obtained practically for free. This is probably the main reason for the popularity of this tool, for so many years, in the NLP community (e.g., the well-known INTELLECT system, one of the first commercially available NL interface for DBMSs, and which is still in use, was ATN-based). There are, however, objective reasons for the decline in popularity of ATNs. They are related mainly to (1) their inflexibility, which obliges an ATN parser to arrive at an end-state in order to make clear the structure of a sentence, and (2) the threat of combinatorial explosion linked with the number of possible backtracking operations, which grows exponentially with the complexity of the network -- even if it is possible to make use of heuristic rules to determine the order of priority given to the different arcs that proceed from a particular node.


Previous Table of Contents Next

footer nav
Use of this site is subject certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details.