Previous Table of Contents Next


String Positions and Substrings

Positions in strings are between characters, numbered starting at 1 before the first character, and there is a position after the last character, as shown by this example:

Substrings are specified by bounding positions. For example, the substring between positions 2 and 5 in “mantra” is “ant”.

Subscripting can be used to produce substrings. For example, if

    word := “mantra”

then word[2:5] produces the substring of word between positions 2 and 5—”ant” in this case.

The expression s[i] is shorthand for specifying the ith character of s (the character after position i). For example, word[1] is “m”.

There also are nonpositive position specifications starting at 0 after the last character and decreasing toward the left:

For example, the positions 5 and –2 in “mantra” are equivalent. Positive and nonpositive position specifications can be intermixed and given in any order. Thus, word[2:5] and word[–2,2] are equivalent.

A string can be assigned to a subscripting expression to replace the specified substring. For example, if

    word := “thesis”

then

    word[1] := “T”

changes word to “Thesis”.

The replacement need not be the same length as the substring it replaces. For example,

    word[4:0] := “”

replaces the substring “sis” of “Thesis” by the empty string to change the value of word to “The”.

The operation !s generates the one-character substrings of s, from beginning to end. For example,

    every write(!”Hello”)

writes H, e, l, l, o on separate lines.

String Comparison

As mentioned earlier, Icon uses the 256-character extended ASCII character set. The ASCII codes for characters impose an ordering on the characters.

Strings can be compared on the basis of the codes for their characters. The character c1 is less than c2 if the internal code for c1 is less than the code for c2. For example, the (ASCII) code for “D” is 68, and the code for “Q” is 81, so “D” is less than “Q”. The codes for letters have the same order as ordinary alphabetical order, but the lowercase and uppercase letters have different codes.

The codes for the digits are smaller than the codes for letters, and the uppercase letters have smaller codes than the lowercase letters. Other characters, such as punctuation, have various codes.

For strings, order is determined by the order of their characters, from left to right. Therefore, in ASCII “DQ” is less than “dQ” and “dQ” is less than “dq”. If one string is an initial substring of another, the shorter string is less than the longer. For example, “DQa” is lexically less than “DQaaa”. The empty string is less than any other string. Two strings are equal if and only if they have the same length and are the same, character by character.

There are six comparison operations for strings, which succeed and return the right operand if the comparison is successful but fail otherwise:

s1 == s2—equal
s1 ~== s2—not equal
s1 << s2—less than
s1 <<= s2—less than or equal
s1 >> s2—greater than
s1 >>= s2—greater than or equal

String Analysis

Icon has several functions for analyzing strings. They all return positions in strings. The string-analysis functions include

upto(c, s)—The first position in s of a character in the cset c
many(c,s)—The position in s following the longest initial sequence of characters in c
find(s1, s2)—The position of the first instance of s1 in s2
match(s1, s2)—The position in s2 after the initial substring s1

These functions fail if there is no match. The functions upto() and find() are generators that produce positions of successive matches.

An example of using string-analysis functions is

    while instruction := read() do {
       j := upto(“ “, instruction) | next       # skip bad lines
       command := instruction[1, j]
       if match(command, “comment”) then next   # skip comments
       else  …                                  # process command
       }


Previous Table of Contents Next