Many of the Emacs functions that exist and that you may write involve searching and manipulating the text in a buffer. Such functions are particularly useful in specialized modes, like the programming language modes described in Chapter 9. Many built-in Emacs functions relate to text in strings and buffers; the most interesting ones take advantage of Emacs's regular expression facility, which we introduced in Chapter 3.
We first describe the basic functions relating to buffers and strings that don't use regular expressions. Afterwards, we discuss regular expressions in more depth than was the case in Chapter 3, concentrating on the features that are most useful to Lisp programmers, and we describe the functions that Emacs makes available for dealing with regular expressions.
Table 11-4 shows some basic Emacs functions relating to buffers, text, and strings that are only useful to Lisp programmers and thus aren't bound to keystrokes. We already saw a couple of them in the count-words-buffer example. Notice that some of these are predicates, and their names reflect this.
Table 11-4. Buffer and text functions
Function |
Value or action |
---|---|
point |
Character position of point. |
mark |
Character position of mark. |
point-min |
Minimum character position (usually 1). |
point-max |
Maximum character position (usually size of buffer). |
bolp |
Whether point is at the beginning of the line (t or |
eolp |
Whether point is at the end of the line. |
bobp |
Whether point is at the beginning of the buffer. |
eobp |
Whether point is at the end of the buffer. |
insert |
Insert any number of arguments (strings or characters) into the buffer after point. |
number-to-string |
Convert a numerical argument to a string. |
string-to-number |
Convert a string argument to a number (integer or floating point). |
char-to-string |
Convert a character argument to a string. |
substring |
Given a string and two integer indices start and
end, return the substring starting after
start and ending before
end. Indices start at 0. For example,
|
aref |
Array indexing function that can be used to return individual
characters from strings; takes an integer argument and returns the
character as an integer, using the ASCII code (on most machines). For
example, |
Many functions not included in the previous table deal with buffers and text, including some that you should be familiar with as user commands. Several commonly used Emacs functions use regions, which are areas of text within a buffer. When you are using Emacs, you delineate regions by setting the mark and moving the cursor. However, region-oriented functions (such as kill-region, indent-region, and shell-command-on-region—really, any function with region in its name) are actually more flexible when used within Emacs Lisp code. They typically take two integer arguments that are used as the character positions of the boundaries for the region on which they operate. These arguments default to the values of point and mark when the functions are called interactively.
Obviously, allowing point and mark as interactive defaults is a more general (and thus more desirable) approach than one in which only point and mark can be used to delineate regions. The r option to the interactive function makes it possible. For example, if we wanted to write the function translate-region-into-German, here is how we would start:
(defun translate-region-into-German (start end) (interactive "r") ...
The r option to interactive fills in the two arguments start and end when the function is called interactively, but if it is called from other Lisp code, both arguments must be supplied. The usual way to do this is like this:
(translate-region-into-German (point) (mark))
But you need not call it in this way. If you wanted to use this function to write another function called translate-buffer-into-German, you would only need to write the following as a "wrapper":
(defun translate-buffer-into-German ( ) (translate-region-into-German (point-min) (point-max)))
In fact, it is best to avoid using point and mark within Lisp code unless doing so is really necessary; use local variables instead. Try not to write Lisp functions as lists of commands a user would invoke; that sort of behavior is better suited to macros (see Chapter 6).
Regular expressions (regexps) provide much more powerful ways of dealing with text. Although most beginning Emacs users tend to avoid commands that use regexps, like replace-regexp and re-search-forward, regular expressions are widely used within Lisp code. Such modes as Dired and the programming language modes would be unthinkable without them. Regular expressions require time and patience to become comfortable with, but doing so is well worth the effort for Lisp programmers, because they are one of the most powerful features of Emacs, and many things are not practical to implement in any other way.
One trick that can be useful when you are experimenting with regular expressions and trying to get the hang of them is to type some text into a scratch buffer that corresponds to what you're trying to match, and then use isearch-forward-regexp (C-M-s) to build up the regular expression. The interactive, immediate feedback of an incremental search can show you the pieces of the regular expression in action in a way that is completely unique to Emacs.
We introduce the various features of regular expressions by way of a few examples of search-and-replace situations; such examples are easy to explain without introducing lots of extraneous details. Afterward, we describe Lisp functions that go beyond simple search-and-replace capabilities with regular expressions. The following are examples of searching and replacing tasks that the normal search/replace commands can't handle or handle poorly:
You are developing code in C, and you want to combine the
functionality of the functions read
and
readfile
into a new function called
get
. You want to replace all references to these
functions with references to the new one.
You are writing a troff document using outline mode, as described in Chapter 7. In outline mode, headers of document sections have lines that start with one or more asterisks. You want to write a function called remove-outline-marks to get rid of these asterisks so that you can run troff on your file.
You want to change all occurrences of program in a document, including programs and program's, to module/modules/module's, without changing programming to moduleming or programmer to modulemer.
You are working on documentation for some C software that is being rewritten in Java. You want to change all the filenames in the documentation from <filename>.c to <filename>.java, since .java is the extension the javac compiler uses.
You just installed a new C++ compiler that prints error messages in German. You want to modify the Emacs compile package so that it can parse the error messages correctly (see the end of Chapter 9).
We will soon show how to use regular expressions to deal with these examples, which we refer to by number. Note that this discussion of regular expressions, although more comprehensive than that in Chapter 3, does not cover every feature; those that it doesn't cover are redundant with other features or relate to concepts that are beyond the scope of this book. It is also important to note that the regular expression syntax described here is for use with Lisp strings only; there is an important difference between the regexp syntax for Lisp strings and the regexp syntax for user commands (like replace-regexp), as we will see.
Regular expressions
began
as an idea in theoretical computer science, but they have found their
way into many nooks and crannies of everyday, practical computing.
The syntax used to represent them may vary, but the concepts are much
the same everywhere. You probably already know a subset of regular
expression notation: the wildcard characters used by the Unix shell
or Windows command prompt to match filenames. The Emacs notation is a
bit different; it is similar to those used by the language Perl,
editors like ed
and vi
and
Unix software tools like lex
and
grep
. So let's start with the
Emacs regular expression operators that resemble Unix shell wildcard
character, which are listed in Table 11-5.
Table 11-5. Basic regular expression operators
Emacs operator |
Equivalent |
Function |
---|---|---|
. |
|
Matches any character. |
|
|
Matches any string. |
|
|
Matches |
|
|
Matches any lowercase letter. |
For example, to match all filenames beginning with
program in the Unix shell, you would specify
program*
. In Emacs, you would say
program.*
. To match all filenames beginning with
a through e in the shell,
you would use [a-e]*
or
[abcde]*
; in Emacs, it's
[a-e].*
or [abcde].*
. In other
words, the dash within the brackets specifies a
range of characters.[6] We will provide more on ranges and
bracketed character sets shortly.
To specify a character that is used as a regular expression operator,
you need to precede it with a double-backslash, as in
\\*
to match an asterisk. Why a double backslash?
The reason has to do with the way Emacs Lisp reads and decodes
strings. When Emacs reads a string in a Lisp program, it decodes the
backslash-escaped characters and thus turns double backslashes into
single backslashes. If the string is being used as a regular
expression—that is, if it is being passed to a function that
expects a regular expression argument—that function uses the
single backslash as part of the regular expression syntax. For
example, given the following line of Lisp:
(replace-regexp "fred\\*" "bob*")
the Lisp interpreter decodes the string fred\\*
as
fred\*
and passes it to the replace-regexp command. The replace-regexp command understands
fred\*
to mean fred
followed by
a (literal) asterisk. Notice, however, that the second argument to
replace-regexp is not a regular
expression, so there is no need to backslash-escape the asterisk in
bob*
at all. Also notice that if you were to
invoke the this as a user command, you would not need to double the
backslash, that is, you would type M-x
replace-regexp Enter followed by fred\* and bob*. Emacs decodes strings read from the
minibuffer differently.
The *
regular expression operator in Emacs (by
itself) actually means something different from the
*
in the Unix shell: it means
"zero or more occurrences of whatever is before the
*
." Thus, because
. matches any character, .*
means "zero or more occurrences of any
character," that is, any string at all, including
the empty string. Anything can precede a *
: for
example, read*
matches
"rea" followed by zero or more
d's; file[0-9]*
matches
"file" followed by zero or more
digits.
Two operators are closely related to *
. The first
is +
, which matches one or more occurrences of
whatever precedes it. Thus, read+
matches
"read" and
"readdddd" but not
"rea," and
file[0-9]+
requires that there be at least one
digit after "file." The second is
?
, which matches zero or one occurrence of
whatever precedes it (i.e., makes it optional).
html?
matches
"htm" or
"html," and
file[0-9]?
matches
"file" followed by one optional
digit.
Before we move on to other operators, a few more comments about
character sets and ranges are in order. First, you can specify more
than one range within a single character set. The set
[A-Za-z]
can thus be used to specify all
alphabetic characters; this is better than the nonportable
[A-z]
. Combining ranges with lists of characters
in sets is also possible; for example, [A-Za-z_]
means all alphabetic characters plus underscore, that is, all
characters allowed in the names of identifiers in C. If you give
^
as the first character in a set, it acts as a
"not" operator; the set matches all
characters that aren't the characters after the
^
. For example, [^A-Za-z]
matches all nonalphabetic characters.
A ^
anywhere other than first in a character set
has no special meaning; it's just the caret
character. Conversely, -
has no special meaning if
it is given first in the set; the same is true for
]
. However, we don't recommend
that you use this shortcut; instead, you should
double-backslash-escape these characters just to be on the safe side.
A double backslash preceding a nonspecial character usually means
just that character—but watch it! A few letters and punctuation
characters are used as regular expression operators, some of which
are covered in the following section. We list "booby
trap" characters that become operators when
double-backslash-escaped later. The ^
character
has a different meaning when used outside of ranges, as
we'll see soon.
If you want to get *
, +
, or
?
to
operate on more than one character,
you can use the \\(
and \\)
operators for grouping. Notice that, in this case (and others to
follow), the backslashes are part of the operator. (All of the
nonbasic regular expression operators include backslashes so as to
avoid making too many characters
"special." This is the most
profound way in which Emacs regular expressions differ from those
used in other environments, like Perl, so it's
something to which you'll need to pay careful
attention.) As we saw before, these characters need to be
double-backslash-escaped so that Emacs decodes them properly. If one
of the basic operators immediately follows \\)
, it
works on the entire group inside the \\(
and
\\)
. For example, \\(read\\)*
matches the empty string, "read,"
"readread," and so on, and
read\\(file\\)?
matches
"read" or
"readfile." Now we can handle
Example 1, the first of the examples given at the beginning of this
section, with the following Lisp code:
(replace-regexp "read\\(file\\)?" "get")
The alternation operator \\|
is a
"one or the other" operator; it
matches either whatever precedes it or whatever comes after it.
\\|
treats parenthesized groups differently from
the basic operators. Instead of requiring parenthesized groups to
work with subexpressions of more than one character, its
"power" goes out to the left and
right as far as possible, until it reaches the beginning or end of
the regexp, a \\(
, a \\)
, or
another \\|
. Some examples should make this
clearer:
read\\|get
matches
"read" or
"get"
readfile\\|read\\|get
matches
"readfile",
"read," or
"get"
\\(read\\|get\\)file
matches
"readfile" or
"getfile"
In the first example, the effect of the \\|
extends to both ends of the regular expression. In the second, the
effect of the first \\|
extends to the beginning
of the regexp on the left and to the second \\|
on
the right. In the third, it extends to the backslash-parentheses.
Another important category of regular expression operators has to do with specifying the context of a string, that is, the text around it. In Chapter 3 we saw the word-search commands, which are invoked as options within incremental search. These are special cases of context specification; in this case, the context is word-separation characters, for example, spaces or punctuation, on both sides of the string.
The simplest context operators for regular expressions are
^
and $
, two more basic
operators that are used at the beginning and end of regular
expressions respectively. The ^
operator causes
the rest of the regular expression to match only if it is at the
beginning of a line; $
causes the regular
expression preceding it to match only if it is at the end of a line.
In Example 2, we need a function that matches occurrences of one or
more asterisks at the beginning of a line; this will do it:
(defun remove-outline-marks ( ) "Remove section header marks created in outline-mode." (interactive) (replace-regexp "^\\*+" ""))
This function finds lines that begin with one or more asterisks (the
\\*
is a literal asterisk and the
+
means "one or
more"), and it replaces the asterisk(s) with the
empty string "", thus deleting them.
Note that ^
and $
can't be used in the middle of regular expressions
that are intended to match strings that span more than one line.
Instead, you can put \n
(for Newline) in your
regular expressions to match such strings. Another such character you
may want to use is \t
for Tab. When
^
and $
are used with regular
expression searches on strings instead of buffers, they match
beginning- and end-of-string, respectively; the function string-match, described later in this chapter,
can be used to do regular expression search on strings.
Here is a real-life example of a complex regular expression that covers the operators we have seen so far: sentence-end, a variable Emacs uses to recognize the ends of sentences for sentence motion commands like forward-sentence (M-e). Its value is:
"[.?!][]\"')}]*\\($\\|\t\\| \\)[ \t\n]*"
Let's look at this piece by piece. The first
character set, [.?!]
, matches a period, question
mark, or exclamation mark (the first two of these are regular
expression operators, but they have no special meaning within
character sets). The next part, []\"')}]*
,
consists of a character set containing right bracket, double quote,
single quote, right parenthesis, and right curly brace. A
*
follows the set, meaning that zero or more
occurrences of any of the characters in the set matches. So far,
then, this regexp matches a sentence-ending punctuation mark followed
by zero or more ending quotes, parentheses, or curly braces. Next,
there is the group \\($\\|\t\\| \\)
, which matches
any of the three alternatives $
(end of line),
Tab
, or two spaces. Finally, [
\t\n]*
matches zero or more spaces, tabs, or newlines. Thus
the sentence-ending characters can be followed by end-of-line or a
combination of spaces (at least two), tabs, and newlines.
There are other context operators besides ^
and
$
; two of them can be used to make regular
expression search act like word search. The operators
\\<
and \\>
match the
beginning and end of a word, respectively. With these we can go part
of the way toward solving Example 3. The regular expression
\\<program\\>
matches
"program" but not
"programmer" or
"programming" (it also
won't match
"microprogram"). So far so good;
however, it won't match
"program's" or
"programs." For this, we need a
more complex regular expression:
\\<program\\('s\\|s\\)?\\>
This expression means, "a word beginning with program followed optionally by apostrophe s or just s." This does the trick as far as matching the right words goes.
There is still one piece
missing: the ability to replace
"program" with
"module" while leaving any
s
or 's
untouched. This leads
to the final regular expression feature we will cover here: the
ability to retrieve portions of the matched string for later use. The
preceding regular expression is indeed the correct one to give as the
search string for replace-regexp. As
for the replace string, the answer is module\\1
;
in other words, the required Lisp code is:
(replace-regexp "\\<program\\('s\\|s\\)?\\>" "module\\1")
The \\1
means, in effect,
"substitute the portion of the matched string that
matched the subexpression inside the \\(
and
\\)
." It is the only
regular-expression-related operator that can be used in replacements.
In this case, it means to use 's
in the replace
string if the match was
"program's,"
s
if the match was
"programs," or nothing if the match
was just "program." The result is
the correct substitution of
"module" for
"program,"
"modules" for
"programs," and
"module's" for
"program's."
Another example of this feature solves Example 4. To match filenames <filename>.c and replace them with <filename>.java, use the Lisp code:
(replace-regexp "\\([a-zA-Z0-9_]+\\)\\.c" "\\1.java")
Remember that \\
. means a literal dot
(.). Note also that the filename pattern (which
matches a series of one or more alphanumerics or underscores) was
surrounded by \\(
and \\)
in
the search string for the sole purpose of retrieving it later with
\\1
.
Actually, the \\1
operator is only a special case
of a more powerful facility (as you may have guessed). In general, if
you surround a portion of a regular expression with
\\(
and \\)
, the string
matching the parenthesized subexpression is saved. When you specify
the replace string, you can retrieve the saved substrings with
\\
n
, where
n
is the number of the parenthesized
subexpression from left to right, starting with 1. Parenthesized
expressions can be nested; their corresponding
\\
n
numbers are
assigned in order of their \\(
delimiter from left
to right.
Lisp code that takes full advantage of this feature tends to contain complicated regular expressions. The best example of this in Emacs's own Lisp code is compilation-error-regexp-alist, the list of regular expressions the compile package (discussed in Chapter 9) uses to parse error messages from compilers. Here is an excerpt, adapted from the Emacs source code (it's become much too long to reproduce in its entirety; see below for some hints on how to find the actual file to study in its full glory):
(defvar compilation-error-regexp-alist '( ;; NOTE! See also grep-regexp-alist, below. ;; 4.3BSD grep, cc, lint pass 1: ;; /usr/src/foo/foo.c(8): warning: w may be used before set ;; or GNU utilities: ;; foo.c:8: error message ;; or HP-UX 7.0 fc: ;; foo.f :16 some horrible error message ;; or GNU utilities with column (GNAT 1.82): ;; foo.adb:2:1: Unit name does not match file name ;; or with column and program name: ;; jade:dbcommon.dsl:133:17:E: missing argument for function call ;; ;; We'll insist that the number be followed by a colon or closing ;; paren, because otherwise this matches just about anything ;; containing a number with spaces around it. ;; We insist on a non-digit in the file name ;; so that we don't mistake the file name for a command name ;; and take the line number as the file name. ("\\([a-zA-Z][-a-zA-Z._0-9]+: ?\\)?\ \\([a-zA-Z]?:?[^:( \t\n]*[^:( \t\n0-9][^:( \t\n]*\\)[:(][ \t]*\\([0-9]+\\)\ \\([) \t]\\|:\\(\\([0-9]+:\\)\\|[0-9]*[^:0-9]\\)\\)" 2 3 6) ;; Microsoft C/C++: ;; keyboard.c(537) : warning C4005: 'min' : macro redefinition ;; d:\tmp\test.c(23) : error C2143: syntax error : missing ';' before 'if' ;; This used to be less selective and allow characters other than ;; parens around the line number, but that caused confusion for ;; GNU-style error messages. ;; This used to reject spaces and dashes in file names, ;; but they are valid now; so I made it more strict about the error ;; message that follows. ("\\(\\([a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\)) \ : \\(error\\|warning\\) C[0-9]+:" 1 3) ;; Caml compiler: ;; File "foobar.ml", lines 5-8, characters 20-155: blah blah ("^File \"\\([^,\" \n\t]+\\)\", lines? \\([0-9]+\\)[-0-9]*, characters? \ \\([0-9]+\\)" 1 2 3) ;; Cray C compiler error messages ("\\(cc\\| cft\\)-[0-9]+ c\\(c\\|f77\\): ERROR \\([^,\n]+, \\)* File = \ \\([^,\n]+\\), Line = \\([0-9]+\\)" 4 5) ;; Perl -w: ;; syntax error at automake line 922, near "':'" ;; Perl debugging traces ;; store::odrecall('File_A', 'x2') called at store.pm line 90 (".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]" 1 2) ;; See http://ant.apache.org/faq.html ;; Ant Java: works for jikes ("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):\\([0-9]+\\):[0-9]+:[0-9]\ +:" 1 2 3) ;; Ant Java: works for javac ("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):" 1 2) )
This is a list of elements that have at least three parts each: a regular expression and two numbers. The regular expression matches error messages in the format used by a particular compiler or tool. The first number tells Emacs which of the matched subexpressions contains the filename in the error message; the second number designates which of the subexpressions contains the line number. (There can also be additional parts at the end: a third number giving the position of the column number of the error, if any, and any number of format strings used to generate the true filename from the piece found in the error message, if needed. For more details about these, look at the actual file, as described below.)
For example, the element in the list dealing with Perl contains the regular expression:
".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]"
followed by 1 and 2, meaning that the first parenthesized subexpression contains the filename and the second contains the line number. So if you have Perl's warnings turned on—you always do, of course—you might get an error message such as this:
syntax error at monthly_orders.pl line 1822, near "$"
The regular expression ignores everything up to
at. Then it finds
monthly_orders.pl, the filename, as the match to
the first subexpression "[^
\n]+
" (one or more nonblank, nonnewline
characters), and it finds 1822, the line number, as the match to the
second subexpression
"[0-9]+
" (one or
more digits).
For the most part, these regular expressions are documented pretty well in their definitions. Understanding them in depth can still be a challenge, and writing them even more so! Suppose we want to tackle Example 5 by adding an element to this list for our new C++ compiler that prints error messages in German. In particular, it prints error messages like this:
Fehler auf Zeilelinenum
infilename
:text of error message
Here is the element we would add to compilation-error-regexp-alist:
("Fehler auf Zeile \\([0-9]+\\) in \\([^: \t]+\\):" 2 1)
In this case, the second parenthesized subexpression matches the filename, and the first matches the line number.
To add this to compilation-error-regexp-alist, we need to put this line in .emacs:
(setq compilation-error-regexp-alist (cons '("Fehler auf Zeile \\([0-9]+\\) in \\([^: \t]+\\):" 2 1) compilation-error-regexp-alist))
Notice how this example resembles our example (from Chapter 9) of adding support for a new language mode to auto-mode-alist.
Table 11-6 concludes our discussion of regular expression operators with a reference list of all the operators covered.
Table 11-6. Regular expression operators
Operator |
Function |
---|---|
. |
Match any character. |
|
Match 0 or more occurrences of preceding char or group. |
|
Match 1 or more occurrences of preceding char or group. |
|
Match 0 or 1 occurrences of preceding char or group. |
|
Set of characters; see below. |
|
Begin a group. |
|
End a group. |
|
Match the subexpression before or after \\|. |
|
At beginning of regexp, match beginning of line or string. |
|
At end of regexp, match end of line or string. |
|
Match Newline within a regexp. |
|
Match Tab within a regexp. |
|
Match beginning of word. |
|
Match end of word. |
The following operators are meaningful within character sets: | |
|
At beginning of set, treat set as chars not to match. |
|
Specify range of characters. |
The following is also meaningful in regexp replace strings: | |
|
Substitute portion of match within the |
Finally, the following characters are operators (not discussed here)
when double-backslash-escaped: b
,
B
, c
, C
,
w
, W
, s
,
S
, =
, _
,
', and `
. Thus, these are
"booby traps" when
double-backslash-escaped. Some of these behave similarly to the
character class aliases you may have encountered in Perl and Java
regular expressions.
As mentioned above, the full auto-mode-alist has a lot more entries and
documentation than fit in this book. The
compile.el
module in which it is defined also
contains functions that use it. One of the best ways to learn how to
use Emacs Lisp (as well as discovering things you might not have even
realized you can do) is to browse through the implementations of
standard modules that are similar to what you're
trying to achieve, or that are simply interesting. But how do you
find them?
The manual way is to look at the value of the variable load-path. This is the variable Emacs consults when it needs to load a library file itself, so any library you're looking for must be in one of these directories. (This variable is discussed further in the final section of this chapter.) The problem, as you will see if you look at the current value of the variable, is that it contains a large number of directories for you to wade through, which would be pretty tedious each time you're curious about a library. (An easy way to see the variable's value is through Help's "Describe variable" feature, C-h v.)
One of the authors wrote the command listed in Example 11-1 to address this problem and uses it regularly to easily snoop on the source files that make much of Emacs run. If you don't want to type this entire function into your .emacs by hand, you can download it from this book's web site, http://www.oreilly.com/catalog/gnu3.
Example 11-1. find-library-file
(defun find-library-file (library) "Takes a single argument LIBRARY, being a library file to search for. Searches for LIBRARY directly (in case relative to current directory, or absolute) and then searches directories in load-path in order. It will test LIBRARY with no added extension, then with .el, and finally with .elc. If a file is found in the search, it is visited. If none is found, an error is signaled. Note that order of extension searching is reversed from that of the load function." (interactive "sFind library file: ") (let ((path (cons "" load-path)) exact match elc test found) (while (and (not match) path) (setq test (concat (car path) "/" library) match (if (condition-case nil (file-readable-p test) (error nil)) test) path (cdr path))) (setq path (cons "" load-path)) (or match (while (and (not elc) path) (setq test (concat (car path) "/" library ".elc") elc (if (condition-case nil (file-readable-p test) (error nil)) test) path (cdr path)))) (setq path (cons "" load-path)) (while (and (not match) path) (setq test (concat (car path) "/" library ".el") match (if (condition-case nil (file-readable-p test) (error nil)) test) path (cdr path))) (setq found (or match elc)) (if found (progn (find-file found) (and match elc (message "(library file %s exists)" elc) (sit-for 1)) (message "Found library file %s" found)) (error "Library file \"%s\" not found." library))))
Once this command is defined, you can visit any
library's implementation by typing M-x find-library file Enter
libraryname
Enter. If you use it as often as this author
does, you too may find it worth binding to a key sequence. We
won't present a detailed discussion of how this
function works because it goes a bit deeper than this chapter, but if
you're curious about what some of the functions do,
you can put your cursor in the function name in a Lisp buffer and use
the Help system's "Describe
function" (C-h f)
feature to get more information about it.
If you find that most of the time when you ask for a library, you end
up with a file containing a lot of cryptic numeric codes and no
comments, check if the filename ends in .elc
. If
that is usually what you end up with, it means that only the
byte-compiled versions of the libraries (see the discussion at the
end of this chapter) have been installed on your system. Ask your
system administrator if you can get the source installed;
that's an important part of being able to learn and
tweak the Emacs Lisp environment.
The functions re-search-forward, re-search-backward, replace-regexp, query-replace-regexp, highlight-regexp, isearch-forward-regexp, and isearch-backward-regexp are all user commands that use regular expressions, and they can all be used within Lisp code (though it is hard to imagine incremental search being used within Lisp code). The section on customizing major modes later in this chapter contains an example function that uses re-search-forward. To find other commands that use regexps you can use the "apropos" help feature (C-h a regexp Enter).
Other such functions aren't available as user
commands. Perhaps the most widely used one is looking-at. This function takes a regular
expression argument and does the following: it returns
t
if the text after point matches the regular
expression (nil
otherwise); if there was a match,
it saves the pieces surrounded by \\(
and
\\)
for future use, as seen earlier. The function
string-match is similar: it takes
two arguments, a regexp and a string. It returns the starting index
of the portion of the string that matches the regexp, or
nil
if there is no match.
The functions match-beginning and
match-end can be used to retrieve
the saved portions of the matched string. Each takes as an argument
the number of the matched expression (as in
\\
n
in replace-regexp replace strings) and returns
the character position in the buffer that marks the beginning (for
match-beginning) or end (for
match-end) of the matched string.
With the argument 0
, the character position that
marks the beginning/end of the entire string matched by the regular
expression is returned.
Two more functions are needed to make the above useful: we need to know how to convert the text in a buffer to a string. No problem: buffer-string returns the entire buffer as a string; buffer-substring takes two integer arguments, marking the beginning and end positions of the substring desired, and returns the substring.
With these functions, we can write a bit of Lisp code that returns a string containing the portion of the buffer that matches the nth parenthesized subexpression:
(buffer-substring (match-beginningn
(match-endn
)))
In fact, this construct is used so often that Emacs has a built-in
function, match-string, that acts as
a shorthand; (match-string
n
)
returns the same
result as in the previous example.
An example should show how this capability works. Assume you are writing the Lisp code that parses compiler error messages, as in our previous example. Your code goes through each element in compilation-error-regexp-alist, checking if the text in a buffer matches the regular expression. If it matches, your code needs to extract the filename and the line number, visit the file, and go to the line number.
Although the code for going down each element in the list is beyond what we have learned so far, the routine basically looks like this:
for each element in
compilation-error-regexp-alist (let ((regexpthe regexp in the element
) (file-subexpthe number of the filename subexpression
) (line-subexpthe number of the line number subexpression
)) (if (looking-at regexp) (let ((filename (match-string file-subexp)) (linenum (match-string line-subexp))) (find-file-other-window filename) (goto-line linenum)) (otherwise, try the next element in the list
)))
The second let extracts the filename
from the buffer from the beginning to the end of the match to the
file-subexp
-th subexpression, and it extracts the
line number similarly from the line-subexp
-th
subexpression (and converts it from a string to a number). Then the
code visits the file (in another window, not the same one as the
error message buffer) and goes to the line number where the error
occurred.
The code for the calculator mode later in this chapter contains a few other examples of looking-at, match-beginning, and match-end.
Emacs contains hundreds of built-in functions that may be of use to you in writing Lisp code. Yet finding which one to use for a given purpose is not so hard.
The first thing to realize is that you will often need to use
functions that are already accessible as keyboard commands. You can
use these by finding out what their function names are via the
C-h k (for describe-key) command (see Chapter 14). This gives the
command's full documentation, as opposed to
C-h c (for describe-key-briefly), which gives only the
command's name. Be careful: in a few cases, some
common keyboard commands require an argument when used as Lisp
functions. An example is forward-word; to get the equivalent of typing
M-f, you have to use
(forward-word 1)
.
Another powerful tool for getting the right function for the job is
the command-apropos (C-h a) help function. Given a regular
expression, this help function searches for all commands that match
it and display their key bindings (if any) and documentation in a
*Help*
window. This can be a great help if you are
trying to find a command that does a certain
"basic" thing. For example, if you
want to know about commands that operate on words, type C-h
a
followed by
word
, and you will see documentation on
about a dozen and a half commands having to do with words.
The limitation with command-apropos
is that it gives information only on functions that can be used as
keyboard commands. Even more powerful is apropos, which is not accessible via any of
the help keys (you must type M-x apropos
Enter). Given a regular expression, apropos displays all functions, variables, and
other symbols that match it. Be warned, though: apropos can take a long time to run and can
generate very long lists if you use it with a general enough concept
(such as buffer
).
You should be able to use the apropos commands on a small number of well-chosen keywords and find the function(s) you need. Because, if a function seems general and basic enough, the chances are excellent that Emacs has it built-in.
After you find the function you are interested in, you may find that
the documentation that apropos
prints does not give you enough information about what the function
does, its arguments, how to use it, or whatever. The best thing to do
at this point is to search Emacs's Lisp source code
for examples of the function's use.
"A Treasure Trove of Examples"
earlier in this chapter provides ways of finding out the names of
directories Emacs loads libraries from and an easy way of looking at
a library once you know its name. To search the contents of the
library files you'll need to use
grep
or some other search facility to find
examples, then edit the files found to look at the surrounding
context. If you're ambitious you could put together
the examples and concepts we've discussed so far to
write an extension of the find-library-file command that searches the
contents of the library files in each directory
on the load path! Although most of Emacs's built-in
Lisp code is not profusely documented, the examples of function use
that it provides should be helpful—and may even give you ideas
for your own functions.
By now, you should have a framework of Emacs Lisp that should be sufficient for writing many useful Emacs commands. We have covered examples of various kinds of functions, both Lisp primitives and built-in Emacs functions. You should be able to extrapolate many others from the ones given in this chapter along with help techniques such as those just provided. In other words, you are well on your way to becoming a fluent Emacs Lisp programmer. To test yourself, start with the code for count-words-buffer and try writing the following functions:
Print the number of lines in the buffer.
Print the number of words in a region.
Print the number of the line point is currently on.
[6] Emacs uses ASCII codes (on most machines) to build ranges, but you shouldn't depend on this fact; it is better to stick to dependable things, like all-lowercase or all-uppercase alphabet subsets or [0-9] for digits, and avoid potentially nonportable items, like [A-z] and ranges involving punctuation characters.