Previous Table of Contents Next


PART V
Icon

6  The Icon Programming Language

Chapter 6
The Icon Programming Language

by Ralph E. Griswold

Icon is a high-level, imperative programming language with extensive facilities for manipulating strings and structures. It features a powerful expression-evaluation mechanism that greatly simplifies many programming tasks.

This presentation of the Icon programming language begins with a brief description of its historical origins, the languages that preceded it, and what motivated a language with Icon’s capabilities.

The remainder is divided into three parts: a description of the language, examples of programming techniques, and information about getting Icon material.

6.1. Background

The origins of Icon go back to the early days of programming languages—to 1962 with the first SNOBOL language (Farber, Griswold, & Polonsky, 1964).

SNOBOL was developed at Bell Laboratories by a small group with no prior experience in programming language design. Their goal was to provide a tool for manipulating text on computers. That was a relatively novel idea at a time when computation was viewed by most persons as an entirely numerical activity.

SNOBOL deals only with strings of characters and has a high-level pattern-matching facility that was novel at the time. SNOBOL adapted control flow concepts from contemporary assembly languages and from COMIT, a language designed for natural language translation (Yngve, 1958). SNOBOL uses the concepts of success and failure with conditional gotos, rather than boolean values, to control program flow. SNOBOL has no other control structures.

A SNOBOL program consists of a sequence of statements. The parts of a statement are

    [label]     subject   pattern   [replacement]   [goto]

The components in brackets are optional. The label identifies the statements for the purpose of transferring control by gotos. The subject provides the string to which the pattern is applied. If the pattern matches, the replacement, if any, replaces the matched part of the subject, changing it. There are three kinds of gotos: unconditional, success, and failure. In the case of the last two, transfer is made depending on whether the pattern matches.

SNOBOL has four kinds of patterns:

  strings, matching only a specified string
  arbitrary, matching any string
  balanced, matching only strings balanced with respect to parentheses
  fixed-length, matching strings of a specified length

Except for strings, which can be specified literally or by string-valued variables, a pattern is indicated by bounding asterisks and other characters to indicate what kind of pattern it is. Names of variables to which matched components of the subject are assigned can be included. Examples are

*HEAD*, which matches an arbitrary string and assigns it to HEAD
*(EXPR)*, which matches a parenthesis-balanced string and assigns it to EXPR
*COL/10*, which matches a string of 10 characters and assigns it to COL

An example of a SNOBOL statement is

    REMOVE   EXPR   “(“  *(EXPR)* “)”     /S(REMOVE)

which removes all outer parentheses from EXPR. Note that if the pattern matches, a new value is assigned to EXPR and control is transferred back to the beginning of the statement.

SNOBOL quickly became popular at a time when there was no commercial software and no other generally available programming languages for manipulating strings. It found users in diverse disciplines. Many researchers in computing in the humanities adopted SNOBOL. By contrast, it was widely used by systems programmers for file creation and modification.

As a result of interest in SNOBOL and the demand for more features, a series of increasingly more powerful SNOBOL languages was developed, culminating in SNOBOL4 (Griswold, Poage, & Polonsky, 1971).

In SNOBOL4, patterns are data objects that can be constructed during program execution. SNOBOL4 uses a much more sophisticated pattern-matching algorithm than SNOBOL. It also provides high-level data structures including tables with associative lookup.

Here is a SNOBOL4 program that counts the “words” in the input file:

        wordcnt = table()
        lower =  “abcdefghijklmnopqrstuvwxyz”
        upper = “ABCDEFGHIJKLMNOPQRSTUVWXYZ”
        letters = lower upper

        findword = break(letters)   span(letters) . word

read    line = input                                  :f(list)
next    line   findword =                             :f(read)
        wordcnt[word] = wordcnt[word] + 1             :(next)

list    convert(wordcnt, “array”)

loop    i = i + 1
        output = wordcnt[i, 1] “ : “ wordcnt[i, 2]    :s(loop)
end

The program is initialized by assigning a table to wordcnt, a string containing the upper- and lowercase letters to letters, and a pattern that finds and then matches a string of consecutive letters to findword. The variable word in this pattern is assigned the string of letters that are matched. Concatenation for both strings and patterns has no explicit operator; components are just written in succession.

A line of input is read and assigned to line by the statement labeled read; reference to input reads a line automatically. When an end of file is reached, this statement fails and control is transferred to the statement labeled list. If the statement succeeds, findword is applied to line to find another word, assign it to word, and delete the initial part of line. If the pattern match succeeds, control continues to the next statement, where the table wordcnt is subscripted by the word and the count for that word is incremented. If the pattern match fails, control is transferred to the statement labeled read to read another line.

When the input is exhausted, control is transferred to the statement labeled list, where wordcnt is converted from a table into a two-dimensional array in which the first column contains the words that were found and the second column contains the respective counts.


Previous Table of Contents Next