Previous | Table of Contents | Next |
6.3.9.2. The DCG Version
Now consider the following text:
sentence --> noun_phrase, verb_phrase. noun_phrase --> [evelyn]. noun_phrase --> [chris]. verb_phrase --> [sings]. verb_phrase --> [jogs]. verb_phrase --> [loves], noun_phrase.
This text constitutes a valid Prolog program as well; the --> expressions are DCG rules, and this text as a whole constitutes a DCG. When Prolog is reading a program, it automatically translates DCG rules into Prolog clauses in the style used earlier. For instance, the first three rules are essentially translated into
sentence(List, Rest) :- noun_phrase(List, Rest1), verb_phrase(Rest1, Rest). noun_phrase(List, Rest) :- append([evelyn], Rest, List). noun_phrase(List, Rest) :- append([chris], Rest, List).
Note that two arguments have been added to each identifier in the first rule. We use the DCG rules by calling these translated clauses, typically with the empty list as the last parameter. For instance, sentence([evelyn, loves, chris], []) is still the form of call we will use to parse that list of constants as a sentence.
Note also that the lists in the bodies of the second and third rules have been replaced by calls to append.3 Lists of any length can be used in DCGs; for instance, the rule noun_phrase --> [the, bookstore, owner] would be translated into
3This is slightly inaccurate; most Prologs use the special predicate C, with a different argument order, rather than append because append is a library predicate that can be redefined. But the effect is the same as the standard definition of append.
noun_phrase(List, Rest) :- append([the, bookstore, owner], Rest, List).
6.3.9.3. Arguments in DCGs
The identifiers in DCG rules can have arguments as well. This is useful for returning a parse tree corresponding to the thing parsed. A parse-tree version of the preceding grammar might be
sentence(Meaning) --> noun_phrase(Subject), verb_phrase(Subject, Meaning). noun_phrase(evelyn) --> [evelyn]. noun_phrase(chris) --> [chris]. verb_phrase(Subject, sings(Subject)) --> [sings]. verb_phrase(Subject, jogs(Subject)) --> [jogs]. verb_phrase(Subject, loves(Subject,Object)) --> [loves], noun_phrase(Object).
The extra arguments are added onto the end of the arguments already given so that the first rule is translated as
sentence(Meaning, List, Rest) :- noun_phrase(Subject, List, Rest1), verb_phrase(Subject, Meaning, Rest1, Rest).
6.3.9.4. Calling Prolog Goals
In some situations, it is useful to be able to call a Prolog goal within a DCG clause in order to test some condition. We can do this by enclosing the goal in curly braces ({ }); the goal simply gets copied into the clause translation as-is. For instance, the rule
uppercase(Character) --> [Character], {Character >= 65, Character =< 90}.
parses one character (and returns it as the meaning), but only if it is an uppercase letter. The translation is
uppercase(Character, Input, Rest) --> append([Character], Rest, Input), Character >= 65, Character =< 90.
Cuts are not excluded from the things we can do inside the curly braces. Here are grammar rules that parse a positive integer from a list of character codes and return the integer:
integer(N) --> [Ch], {digit(Ch), !, D is Ch-48}, rest_of_integer(D, N). rest_of_integer(Sofar, N) --> [Ch], {digit(Ch), !, N1 is (Sofar*10)+(Ch-48)}, rest_of_integer(N1, N). rest_of_integer(Sofar, Sofar) --> [].
These rules keep processing digits until there are no more in the input list and compute the integer represented. The cuts are useful for backtracking efficiency. Either with or without the cuts, a string beginning with 2345 foo would first be parsed as the integer 2345 with the string foo left over. Without the cuts, on failure and backtracking, the string would next be parsed as the integer 234 with the string 5 foo left over, and so on. The cuts allow us to take the entire integer in a greedy manner.
Note the use of the construct --> [] in the last example. DCG rules cannot be of the form atom. because that would be interpreted as a simple Prolog fact (without the extra arguments). If a DCG rule corresponds to parsing nothing, it must have a body consisting of at least the empty list.
6.3.9.5. Two-Level Parsing
The last two examples illustrate that DCGs can also apply to the lexical level of parsing. Because strings are just lists of integers, we can put strings into DCG bodies and they are treated just like any other lists. Here is part of a grammar to parse a simplified version of Internet URLs:
url --> transport, ://, machine, /, path. url --> file:/, path. transport --> http. transport --> ftp. machine --> ident, dot_idents. dot_idents --> []. dot_idents --> ., ident, dot_idents. path --> []. path --> ident. path --> ident, /, path.
It is sometimes useful to have one DCG for the lexical level and another for the token level. The top-level loop of a program reads a file, tokenizing each line into a list of tokens with the lexical-level DCG; then it appends the token lists together and parses the whole thing with the token-level grammar.
Prolog is a largely declarative language, which means, among other things, that all information in a normal computation arises from the program or from parameters passed to and from predicates. However, occasionally it is useful to be able change the global state of the computation in order to store persistent data.
The Prolog assert and retract predicates allow us to do this. They are most useful in two situations: to record information that is used by only a small number of predicates and to save data from one query to another. Although we can use assert and retract to perform imperative-style computation, this is not advisable (see section 6.4.2).
Previous | Table of Contents | Next |