Previous | Table of Contents | Next |
The term pattern matching refers to a view of string scanning in which expressions are thought of in terms of the strings they can match instead of how they do it. For example, move(i) can be thought of in two ways: (1) as an expression that adds i to the current position and returns the substring between the previous and new positions, or (2) as a pattern that matches any string that is i characters long.
Although pattern matching is primarily a matter of viewpoint, it provides a powerful conceptual tool for analyzing strings.
Patterns are composed of matching expressions. There are two built-in matching expressions: move(i) and tab(i). More complicated matching expressions can be built by combining the built-in ones and by writing matching procedures.
6.3.2.1. The Matching Protocol
A matching expression is defined in terms of a protocol that specifies what is allowed:
The requirement that the position cannot be decreased is not essential to the concept of pattern matching, but it simplifies some of the scussion that follows. It also is natural for most string analysis in Icon and fits nicely with string-analysis functions, which always produce positions at the current position or to its right. Note that tab(i) and move(i) can be used in ways that violate this requirement. The discussion that follows assumes they are not used to decrease the position in the subject.
There are two fundamentally different aspects to these rules of protocol: what a matching expression produces (the matched portion of the subject) and data backtracking (maintenance of the position).
The idea behind producing the matched portion of the subject is that a matching expression, viewed as a pattern, produces a specific string from among all those that the pattern can match. Thus, while move(i), as a pattern, characterizes all strings of length i, evaluating move(i) at a particular position in a particular subject produces a specific string of length i.
Because matching expressions may advance the position when they match, they have the nice property that the string matched by two matching expressions in succession is the concatenation of the strings they match. For example,
move(i) || move(j)
is a matching expression.
Data backtracking is more subtle. It means that a matching expression is responsible for maintaining the position and, in particular, leaving it as it was if it fails. This assures that alternative patterns are applied at the same place in the subject, and it corresponds to the intuitive notion of matching one pattern or another. For example, in
(move(3) || =.) | move(5)
if move(3) succeeds but =. fails, move(5) matches starting at the same place as move(3) did.
In order for a matching expression to restore the position, it must suspend so that it can regain control if a subsequent matching expression fails. In the preceding example, move(3) suspends and is resumed when =. fails. The suspension has nothing to do with producing alternative matches: Although some matching expressions are generators, move(3) can match in only one way. It suspends only so that it can be resumed to restore the position if a subsequent matching expression fails. Of course, a matching expression that does not change the position need not suspend (but it does need to return the empty string).
6.3.2.2. Composing Matching Expressions
The ability to build up complicated matching expressions from simpler ones is essential to the use of pattern matching. The rules of protocol given in the preceding section determine how matching expressions can be combined to form other matching expressions. The two basic forms of combination are concatenation and alternation. If expr1 and expr2 are matching expressions, then
expr1 || expr2
and
expr1 | expr2
are matching expressions. And, of course, =s is a matching expression because it is just a shorthand for tab(match(s)).
Note that, in general,
expr1 & expr2
is not a matching expression because it produces the string matched by expr2, not the concatenation of the strings matched by expr1 and expr2.
Most operations on matching expressions do not yield matching expressions. Some do not because they do not produce a matched substring, as in
expr1 + expr2
Of course, adding two matched substrings is an unlikely operation. The more serious limitations arise with control structures.
By definition, control structures alter the flow of control. Most control structures have control expressions that determine what expression is evaluated next. Control expressions are bounded, so that if they produce a result, they cannot be resumed. Bounded expressions are fatal to matching because they prevent the backtracking that is required to maintain the position. For example, if expr1 and expr2 are matching expressions,
if expr1 then expr2
is not, in general, a matching expression, because expr1 is bounded and cannot be resumed to restore the position if a subsequent matching expression fails.
Despite other possibilities, matching expressions usually are composed by the concatenation and alternation of other matching expressions. These two operations correspond to match this then match that and match this or match that, respectively.
Previous | Table of Contents | Next |