Brought to you by EarthWeb
IT Library Logo

Click Here!
Click Here!

Search the site:
 
EXPERT SEARCH -----
Programming Languages
Databases
Security
Web Services
Network Services
Middleware
Components
Operating Systems
User Interfaces
Groupware & Collaboration
Content Management
Productivity Applications
Hardware
Fun & Games

EarthWeb Direct EarthWeb Direct Fatbrain Auctions Support Source Answers

EarthWeb sites
Crossnodes
Datamation
Developer.com
DICE
EarthWeb.com
EarthWeb Direct
ERP Hub
Gamelan
GoCertify.com
HTMLGoodies
Intranet Journal
IT Knowledge
IT Library
JavaGoodies
JARS
JavaScripts.com
open source IT
RoadCoders
Y2K Info

Previous Table of Contents Next


2. HISTORY

2.1. THE ORIGINS

We can trace NLP's appearance to the Second World War and to the extensive utilization of statistical techniques for (automatically) breaking enemy codes: they made use, in fact, of "linguistic" (lexicographic) tools consisting of tables of letter frequencies, word frequencies, transition frequencies between successive letters, etc. After the War, this aptitude of the machine for manipulating symbols and helping in the lexicographer's work was soon applied to literary studies. Starting in the late 1950s and the early 1960s, there has been, in fact, a huge, computer-based production of lexicographic tools -- as word indexes (lists of word occurrences) and concordances (indexes where each word is accompanied by a line of context) -- intended to assist in the apprehension of the style and of the thoughts of authors like Livy, Thomas Aquinas, Dante, or Shakespeare.

The sentiment that the use of NLP techniques could go beyond the simple counting and rearranging of words did not take long to appear. In 1946, Warren Weaver and A. Donald Booth began working on the idea that computers could be used to implement solutions to the worldwide problems concerning translation from a natural language into another. They were both familiar with the work on code breaking, and were convinced that translation could also be tackled as a decoding problem: the only real difficulty was, for them, that of incorporating in the decoding system a full, automated dictionary of the two languages. In the 1949 "Weaver Memorandum" -- which Weaver distributed to about 200 of his acquaintances and which marks the official beginning of the machine translation (MT) subfield -- he writes: "... when I look at an article written in Russian, I say, ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode’." This (evidently erroneous) idea that the source and the target texts say exactly the same thing influenced strongly early work on MT. The systems produced until 1954 followed a very simple paradigm based, essentially, on a word-for-word translation technique supported by a dictionary-lookup program. Words in the target language were arranged in the same order as in the source text and, when a source word happened to have two or more target equivalents, all the alternatives were registered.

In 1954, a research group from IBM and Georgetown University set up a public demonstration of a system translating from Russian to English; the system had no particular scientific value, but it encouraged the belief that MT was feasible and within easy reach. This system was representative of the "direct translation" approach, a slight improvement on the early word-for-word techniques. The "direct" systems were monolithic pieces of software specifically designed, in all details, for a particular pair of languages. Linguistic analysis was very elementary: for example, syntactic analysis was normally reduced to do little more than recognition of word classes (noun, verbs, adjectives, etc.) to alleviate ambiguity problems like that, e.g., of "claim" as verb or noun. We must, however, mention here the fact that the popularity of MT in the 1950s and early 1960s was not without influencing favorably the development of other disciplines. For example, the necessity of systematically rearranging the sequence of words produced by the early word-for-word MT systems had made it apparent that some sort of "real" syntactic analysis was urgently needed. This period is then remembered as the golden age of syntax, characterized by the publication of important work by Y. Bar Hillel, Zellig Harris and, mainly, Noam Chomsky -- even if, as already shown, this wave of interest for syntactic theory had little impact on the design of MT systems. Chomsky's "Syntactic Structures," where "transformational grammars" were introduced (see subsection 3.4.1) was published in 1957. In parallel, the need for programming languages well-suited for the manipulation of linguistic structures led to the creation of the first languages, COMIT and SNOBOL, geared more toward symbolic calculus than toward "number crunching" -- this was not without influencing the creation, in 1960, of both ALGOL and LISP, characterized by the presence of new features such as lists and recursion.


Previous Table of Contents Next

footer nav
Use of this site is subject certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details.