Previous Table of Contents Next


6.2.3.3. String Scanning

String scanning is a high-level facility for string analysis. It suppresses much of the detail otherwise involved in string analysis and uses a navigational metaphor.

In string scanning, there is a scanning subject to which all scanning operations apply. There also is a scanning position, initially 1, that may change during string scanning. Changes in the position occur as the result of successful matching in the subject.

String scanning has the form

    subject ? expr

where the subject string is scanned by expr. The value produced by a scanning expression is the value of expr; it fails if expr does.

Matching Expressions

Matches in the subject occur as the result of matching functions:

tab(i)—Moves the position to i
move(i)—Increments the position by i

If the specified position is within the subject, it is changed accordingly; if it is not, the function fails. In the case of a successful match, the matching function returns the substring of the subject between the former position and new position.

Here is a simple example of the use of matching functions:

    line ? {
         while write(move(1)) do    # write one character
               move(1)              # skip the next one
         }

As indicated by the comments, this scanning expression writes every other character in line. The loop terminates when the end of the string is reached because move(1) fails if the new position would not be in the range of the subject.

The power of string scanning comes from using string-analysis functions to provide the argument for tab(), thus moving through the subject to portions of interest. In this usage, the second argument of a string-analysis function is omitted and defaults to the subject of scanning. The string-analysis functions also work starting at the current position in the subject.

This is illustrated by the following segment of code, which converts lines of comma-separated fields to fields of fixed width:

    while line := read() do {
       result := “”
       line ? {
          while field := tab(upto(‘,’)) do {
             result ||:= left(field, width)
             move(1)
             }
       }
    write(result ||:= left(tab(0), width))
    }

For each line of input, result starts with an empty string. The line is scanned, repeatedly looking for a comma with upto(‘,’), which returns its position. tab(upto(‘,’)) moves to that position and matches the substring prior to it. This substring is put at the left of a blank-filled field of the specified width and concatenated onto the evolving result. Before looking for another comma, move(1) skips the current one. When there are no more commas, the while loop terminates, and the remainder of the subject up to the last character, matched by tab(0), is appended in another field. The final result is written before a new line is read. Notice that lines are assumed to have fields separated by commas, but with no comma after the last field. Otherwise, the tab(0) would not be needed.

Note that in string scanning, it is not necessary to specify the actual positions at which the characters are found. In addition, the subject is implicit in the processing and does not have to be mentioned.

Another example is provided by the following code segment, which converts fixed-width fields to comma-separated fields:

    while line := read() do {
       result := “”
       line ? {
          while field := move(width) do
             result ||:= trim(field) || “, “
             }
       write(result ? tab(–1))
       }

Successive fields are matched by move(width) and assigned to field, which is trimmed of trailing blanks and concatenated onto the evolving result. When there are no more fields, the result is scanned to produce the substring up to the last character (at position –1 in the subject) and it is written.

The operation =s is shorthand for tab(match(s)) and is useful for matching a specific string. This is illustrated by rewriting the earlier string-analysis example that does not use string scanning:

    while instruction := read() do {
       instruction ? {
          if =”comment “ then next           # skip comments
          else  command := tab(upto(‘ ’))
             …                               # process command


Previous Table of Contents Next