If you have been using Emacs for a while and have been taking advantage of some of its more advanced features, chances are that you have thought of something useful that Emacs doesn't do. Although Emacs has hundreds of built-in commands, dozens of packages and modes, and so on, everyone eventually runs into some functionality that Emacs doesn't have. Whatever feature you find missing, you can program using Emacs Lisp.
Before you dive in, however, note that this chapter is not for everyone. It is intended for people who have already become comfortable using Emacs and who have a fair bit of programming experience, though not necessarily with Lisp per se. If you have no such experience, you may want to skip this chapter; if there is something specific you would like Emacs to do, you might try to find a friendly Emacs Lisp hacker to help you write the necessary code. Or, if you're a little adventurous, you could skim enough to find the file-template example and learn how to install it—it gives you some useful features.
Readers who are building their Lisp skills but don't necessarily want to read the whole chapter might also want to look for the "Treasure Trove of Examples" section in the middle for a useful tool that can help jumpstart their exploration of the Emacs libraries.
Note that we do not cover Lisp in its entirety in this chapter. That would require another large, dense book. Instead, we cover the basics of the language and other features that are often useful in writing Emacs code. If you wish to go beyond this chapter, refer to the GNU Emacs Lisp Reference Manual, distributed with Emacs (choose Help→ More Manuals→ Introduction to Lisp and Emacs Lisp Reference) for details about the specific Lisp features in Emacs. You may also turn to any of the various Lisp textbooks[1] available for a solid grounding in the language itself.
Emacs Lisp is a full-blown Lisp implementation;[2] thus it is more than the usual macro or script language found in many text editors. (One of the authors has written a small expert system entirely in Emacs Lisp.) In fact, you could even think of Emacs itself as a Lisp system with lots of built-in functions, many of which happen to pertain to text manipulation, window management, file I/O, and other features useful to text editing. The source code for Emacs, written in C, implements the Lisp interpreter, Lisp primitives, and only the most basic commands for text editing; a large layer of built-in Lisp code and libraries on top of that implements the rest of Emacs's functionality. A current version of Emacs comes with close to 250,000 lines of Lisp.
This chapter starts with an introduction to the aspects of Lisp that resemble common programming languages like Java and Perl. These features are enough to enable you to write many Emacs commands. Then we deal with how to interface Lisp code with Emacs so that the functions you write can become Emacs commands. We will see various built-in Lisp functions that are useful for writing your own Emacs commands, including those that use regular expressions; we give an explanation of regular expressions that extends the introduction in Chapter 3 and is oriented toward Lisp programming. We then return to the basics of Lisp for a little while, covering the unique features of the language that have to do with lists, and show how this chapter's concepts fit together by presenting a file template system you can install and use in your own programming or writing projects.
Finally we show you how to program a simple major mode, illustrating that this "summit" of Emacs Lisp programming isn't so hard to scale. After that, you will see how easy it is to customize Emacs's built-in major modes without having to change (or even look at) the code that implements them. We finish the chapter by describing how to build your own library of Lisp packages.
You may have heard of Lisp as a language for artificial intelligence (AI). If you aren't into AI, don't worry. Lisp may have an unusual syntax, but many of its basic features are just like those of more conventional languages you may have seen, such as Java or Perl. We emphasize such features in this chapter. After introducing the basic Lisp concepts, we proceed by building up various example functions that you can actually use in Emacs. In order to try out the examples, you should be familiar with Emacs Lisp mode and Lisp interaction mode, which were discussed in Chapter 9.
The basic elements in Lisp you need to be familiar with are functions, variables, and atoms. Functions are the only program units in Lisp; they cover the notions of procedures, subroutines, programs, and even operators in other languages.
Functions are defined as lists of the above entities, usually as lists of calls to other, existing functions. All functions have return values (as with Perl functions and non-void Java methods); a function's return value is simply the value of the last item in the list, usually the value returned by the last function called. A function call within another function is equivalent to a statement in other languages, and we use statement interchangeably with function call in this chapter. Here is the syntax for function:
(function-name
argument1
argument2
...)
which is equivalent to this:
method_name
(argument1, argument2,
...);
in Java. This syntax is used for all functions, including those equivalent to arithmetic or comparison operators in other languages. For example, in order to add 2 and 4 in Java or Perl, you would use the expression 2 + 4, whereas in Lisp you would use the following:
(+ 2 4)
Similarly, where you would use 4 >= 2 (greater than or equal to), the Lisp equivalent is:
(>= 4 2)
Variables in Lisp are similar to those in any other language, except that they do not have types. A Lisp variable can assume any type of value (values themselves do have types, but variables don't impose restrictions on what they can hold).
Atoms are values of any type, including integers, floating point (real) numbers, characters, strings, Boolean truth values, symbols, and special Emacs types such as buffers, windows, and processes. The syntax for various kinds of atoms is:
Integers are what you would expect: signed whole numbers in the range -227 to 227-1.
Floating point numbers are real numbers that you can represent with decimal points and scientific notation (with lowercase "e" for the power of 10). For example, the number 5489 can be written 5489, 5.489e3, 548.9e1, and so on.
Characters are preceded by a
question mark, for example,
?a
. Esc,
Newline, and Tab are abbreviated \e
,
\n
, and \t
respectively; other
control characters are denoted with the prefix
\C-
, so that (for example) C-a is denoted as
?\C-a
.[3]
Strings are surrounded by
double quotes;
quote marks and backslashes within strings need to be preceded by a
backslash. For example, "Jane said, \"See
Dick run.\"
" is a legal string.
Strings can be split across multiple lines without any special
syntax. Everything until the closing quote, including all the line
breaks, is part of the string value.
Booleans use t
for true and nil
for false,
though
most of the time, if a Boolean value is expected, any
non-nil
value is assumed to mean true.
nil
is also used as a null or nonvalue in various
situations, as we will see.
Symbols are names of things in Lisp, for example, names of variables or functions. Sometimes it is important to refer to the name of something instead of its value, and this is done by preceding the name with a single quote ('). For example, the define-key function, described in Chapter 10, uses the name of the command (as a symbol) rather than the command itself.
A simple example that ties many of these basic Lisp concepts together is the function setq.[4] As you may have figured out from previous chapters, setq is a way of assigning values to variables, as in
(setq auto-save-interval 800)
Notice that setq is a function,
unlike in other languages in which special syntax such as
=
or :=
is used for assignment.
setq takes two arguments: a variable
name and a value. In this example, the variable auto-save-interval (the number of keystrokes
between auto-saves) is set to the value 800
.
setq can actually be used to assign values to multiple variables, as in
(setqthisvar thisvalue
thatvar thatvalue
theothervar theothervalue
)
The return value of setq is simply
the last value assigned, in this case
theothervalue
. You can set the values of
variables in other ways, as we'll see, but setq is the most widely applicable.
Now it's time for
an
example of a simple function definition. Start Emacs without any
arguments; this puts you into the *scratch*
buffer, an empty buffer in Lisp interaction mode (see Chapter 9), so that you can actually try this and
subsequent examples.
Before we get to the example, however, some more comments on Lisp
syntax are necessary. First, you will notice that the dash
(-
) is used as a
"break" character to separate words
in names of variables, functions, and so on. This practice is simply
a widely used Lisp programming convention; thus the dash takes the
place of the underscore (_
) in languages like C
and Ada. A more important issue has to do with all of the parentheses
in Lisp code. Lisp is an old language that was
designed before anyone gave much thought to language syntax (it was
still considered amazing that you could use any language other than
the native processor's binary instruction set), so
its syntax is not exactly programmer-friendly. Yet
Lisp's heavy use of lists—and thus its heavy
use of parentheses—has its advantages, as
we'll see toward the end of this chapter.
The main problem a programmer faces is how to keep all the parentheses balanced properly. Compounding this problem is the usual programming convention of putting multiple right parentheses at the end of a line, rather than the more readable technique of placing each right parenthesis directly below its matching left parenthesis. Your best defense against this is the support the Emacs Lisp modes give you, particularly the Tab key for proper indentation and the flash-matching-parenthesis feature.
Now we're ready for our example function. Suppose you are a student or journalist who needs to keep track of the number of words in a paper or story you are writing. Emacs has no built-in way of counting the number of words in a buffer, so we'll write a Lisp function that does the job:
1 (defun count-words-buffer ( ) 2 (let ((count 0)) 3 (save-excursion 4 (goto-char (point-min)) 5 (while (< (point) (point-max)) 6 (forward-word 1) 7 (setq count (1+ count))) 8 (message "buffer contains %d words." count))))
Let's go through this function line by line and see what it does. (Of course, if you are trying this in Emacs, don't type the line numbers in.)
The defun on line 1 defines the
function by its name and arguments. Notice that defun is itself a function—one that,
when called, defines a new function. (defun returns the name of the function
defined, as a symbol.) The function's arguments
appear as a list of names inside parentheses; in this case, the
function has no arguments. Arguments can be made
optional by preceding them with the keyword
&optional. If an argument is
optional and not supplied when the function is called, its value is
assumed to be nil
.
Line 2 contains a let construct, whose general form is:
(let ((var1 value
1) (var2 value2) ... )statement-block
)
The first thing let does is define
the variables var1
, var2
, etc.,
and set them to the initial values value1
,
value2
, etc. Then let executes the statement
block, which is a sequence of function calls or values,
just like the body of a function.
It is useful to think of let as doing three things:
Defining (or declaring) a list of variables
Setting the variables to initial values, as if with setq
Creating a block in which the variables are known; the let block is known as the scope of the variables
If a let is used to define a variable, its value can be reset later within the let block with setq. Furthermore, a variable defined with let can have the same name as a global variable; all setqs on that variable within the let block act on the local variable, leaving the global variable undisturbed. However, a setq on a variable that is not defined with a let affects the global environment. It is advisable to avoid using global variables as much as possible because their names might conflict with those of existing global variables and therefore your changes might have unexpected and inexplicable side effects later on.
So, in our example function, we use let to define the local variable count and initialize it to 0. As we will see, this variable is used as a loop counter.
Lines 3 through 8 are the statements within the let block. The first of these calls the built-in Emacs function save-excursion, which is a way of being polite. The function is going to move the cursor around the buffer, so we don't want to disorient the user by jumping them to a strange place in their file just because they asked for a word count. Calling save-excursion tells Emacs to remember the location of cursor at the beginning of the function, and go back there after executing any statements in its body. Notice how save-excursion is providing us with capability similar to let; you can think of it as a way of making the cursor location itself a local variable.
Line 4 calls goto-char. The argument to goto-char is a (nested) function call to the built-in function point-min. As we have mentioned before, point is Emacs's internal name for the position of the cursor, and we'll refer to the cursor as point throughout the remainder of this chapter. point-min returns the value of the first character position in the current buffer, which is almost always 1; then, goto-char is called with the value 1, which has the effect of moving point to the beginning of the buffer.
The next line sets up a while loop; Java and Perl have a similar construct. The while construct has the general form
(whilecondition
statement-block
)
Like let and save-excursion, while sets up another statement block.
condition is a value (an atom, a
variable, or a function returning a value). This value is tested; if
it is nil
, the condition is considered to be
false, and the while loop
terminates. If the value is other than nil
, the
condition is considered to be true, the statement block gets
executed, the condition is tested again, and the process repeats.
Of course, it is possible to write an infinite loop. If you write a Lisp function with a while loop and try running it, and your Emacs session hangs, chances are that you have made this all-too-common mistake; just type C-g to abort it.
In our sample function, the condition is the function
<
, which is a less-than function with two
arguments, analogous to the < operator in Java or Perl. The first
argument is another function that returns the current character
position of point; the second argument returns the maximum character
position in the buffer, that is, the length of the buffer. The
function <
(and other relational functions)
return a Boolean value, t
or
nil
.
The loop's statement block consists of two
statements. Line 6 moves point forward one word (i.e., as if you had
typed M-f). Line 7 increments the
loop counter by 1; the function 1+
is shorthand
for (+ 1 variable-name)
. Notice that the third
right parenthesis on line 7 matches the left parenthesis preceding
while. So, the while loop causes Emacs to go through the
current buffer a word at a time while counting the words.
The final statement in the function uses the built-in function
message to print a message in the
minibuffer saying how many words the buffer contains. The form of the
message function will be familiar to
C programmers. The first argument to message is a format string, which contains
text and special formatting instructions of the form
%
x
, where
x
is one of a few possible letters. For
each of these instructions, in the order in which they appear in the
format string, message
reads the next argument and
tries to interpret it according to the letter after the percent sign.
Table 11-1 lists meanings for the letters in the
format string.
Table 11-1. Message format strings
Format string |
Meaning |
---|---|
|
String or symbol |
|
Character |
|
Integer |
|
Floating point in scientific notation |
|
Floating point in decimal-point notation |
|
Floating point in whichever format yields the shortest string |
For example:
(message "\"%s\" is a string, %d is a number, and %c is a character" "hi there" 142 ?q)
causes the message:
"hi there" is a string, 142 is a number, and q is a character
to appear in the minibuffer. This is analogous to the C code:
printf ("\"%s\" is a string, %d is a number, and %c is a character\n", "hi there", 142, 'q');
The floating-point-format characters are a bit more complicated. They assume a certain number of significant digits unless you tell them otherwise. For example, the following:
(message "This book was printed in %f, also known as %e." 2004 2004)
yields this:
This book was printed in 2004.000000, also known as 2.004000e+03.
But you can control the number of digits after the decimal point by
inserting a period and the number of digits desired between the
%
and
the e
,
f
, or g
. For example, this:
(message "This book was printed in %.3e, also known as %.0f." 2004 2004)
prints in the minibuffer:
This book was printed in 2.004e+03, also known as 2004.
The count-words-buffer function that we've just finished works, but it still isn't as convenient to use as the Emacs commands you work with daily. If you have typed it in, try it yourself. First you need to get Emacs to evaluate the lines you typed in, thereby actually defining the function. To do this, move your cursor to just after the last closing parenthesis in the function and type C-j (or Linefeed)—the "evaluate" key in Lisp interaction mode—to tell Emacs to perform the function definition. You should see the name of the function appear again in the buffer; the return value of the defun function is the symbol that has been defined. (If instead you get an error message, double check that your function looks exactly like the example and that you haven't typed in the line numbers, and try again.)
Once the function is defined, you can execute it by typing (count-words-buffer) on its own line in your Lisp interaction window, and once again typing C-j after the closing parenthesis.
Now that you can execute the function correctly from a Lisp
interaction window, try executing the function with M-x, as with any other Emacs command. Try
typing M-x count-words-buffer Enter:
you will get the error message [No match]
. (You
can type C-g to cancel this failed
attempt.) You get this error message because you need to
"register" a function with Emacs to
make it available for interactive use. The function to do this is
interactive, which has the form:
(interactive "prompt-string")
This statement should be the first in a function, that is, right after the line containing the defun and the documentation string (which we will cover shortly). Using interactive causes Emacs to register the function as a command and to prompt the user for the arguments declared in the defun statement. The prompt string is optional.
The prompt string has a special format: for each argument you want to
prompt the user for, you provide a section of prompt string. The
sections are separated by newlines (\n
). The first
letter of each
section is a code for
the type of argument you want. There are many choices; the most
commonly used are listed in Table 11-2.
Table 11-2. Argument codes for interactive functions
Code |
User is prompted for: |
---|---|
|
Name of an existing buffer |
|
Event (mouse action or function key press) |
|
Name of an existing file |
|
Number (integer) |
|
String |
Most of these have uppercase variations | |
|
Name of a buffer that may not exist |
|
Name of a file that may not exist |
|
Number, unless command is invoked with a prefix argument, in which case use the prefix argument and skip this prompt |
|
Symbol |
With the b and f options, Emacs signals an error if the buffer or file given does not already exist. Another useful option to interactive is r, which we will see later. There are many other option letters; consult the documentation for function interactive for the details. The rest of each section is the actual prompt that appears in the minibuffer.
The way interactive is used to fill in function arguments is somewhat complicated and best explained through an example. A simple example is in the function goto-percent, which we will see shortly. It contains the statement
(interactive "nPercent: ")
The n
in the prompt string tells Emacs to prompt
for an integer; the string Percent
: appears in the
minibuffer.
As a slightly more complicated example, let's say we want to write our own version of the replace-string command. Here's how we would do the prompting:
(defun replace-string (from to) (interactive "sReplace string: \nsReplace string %s with: ") ...)
The prompt string consists of two sections, sReplace
string
: and sReplace string
%s
with
:, separated by a Newline. The initial
s
in each means that a string is expected; the
%s
is a formatting operator (as in the previous
message function) that Emacs
replaces with the user's response to the first
prompt. When applying formatting operators in a prompt, it is as if
message has been called with a list
of all responses read so far, so the first formatting operator is
applied to the first response, and so on.
When this command is invoked, first the prompt Replace
string
: appears in the minibuffer. Assume the user types
fred
in response. After the user presses
Enter, the prompt Replace
fred with
: appears. The user types the replacement string
and presses Enter again.
The two strings the user types are used as values of the function arguments from and to (in that order), and the command runs to completion. Thus, interactive supplies values to the function's arguments in the order of the sections of the prompt string.
The use of interactive does not preclude calling the function from other Lisp code; in this case, the calling function needs to supply values for all arguments. For example, if we were interested in calling our version of replace-string from another Lisp function that needs to replace all occurrences of "Bill" with "Deb" in a file, we would use
(replace-string "Bill" "Deb")
The function is not being called interactively in this case, so the interactive statement has no effect; the argument from is set to "Bill," and to is set to "Deb."
Getting back to our count-words-buffer command: it has no arguments, so its interactive command does not need a prompt string. The final modification we want to make to our command is to add a documentation string (or doc string for short), which is shown by online help facilities such as describe-function (C-h f). Doc strings are normal Lisp strings; they are optional and can be arbitrarily many lines long, although, by convention, the first line is a terse, complete sentence summarizing the command's functionality. Remember that any double quotes inside a string need to be preceded by backslashes.
With all of the fixes taken into account, the complete function looks like this:
(defun count-words-buffer ( ) "Count the number of words in the current buffer; print a message in the minibuffer with the result." (interactive) (save-excursion (let ((count 0)) (goto-char (point-min)) (while (< (point) (point-max)) (forward-word 1) (setq count (1+ count))) (message "buffer contains %d words." count))))
[1] We recommend Lisp by Patrick Henry Winston and Berthold Klaus Paul Horn (Addison Wesley).
[2] Experienced Lisp programmers should note that Emacs Lisp most
closely resembles MacLisp, with a few Common Lisp features added.
More complete Common Lisp emulation can be had by loading the package
cl
(see Appendix B).
[3] Integers are also allowed where characters are expected. The ASCII code is used on most machines. For example, the number 65 is interpreted as the character A on such a machine.
[4] We hope that Lisp purists will forgive us for calling setq a function, for the sake of simplicity, rather than a form, which it technically is.