Writing XML involves entering structured information that complies with a document type definition or schema. Even within Emacs, the XML support you receive varies. At the low end of the spectrum, there is plain vanilla Fundamental mode. It provides simply a screen where you type. Specialized modes like SGML mode provide support for entering tags, as we saw earlier in our discussion of HTML mode, a derivative of SGML mode. But neither of these approaches help you parse or validate XML (SGML mode has a command for validating, but it is tricky to set up correctly). More advanced Lisp packages, though currently not included in Emacs, are available to provide these functions. These add-on packages provide validation against DTDs or schemas, parsing capabilities, and, typically, an array of standard DTDs and schema definitions. In Emacs, these tools primarily work in conjunction with one of two major modes. psgml mode validates XML (and SGML) against DTDs. The newer nxml mode validates against RELAX NG schemas. We cover both of these options in this section. Before we go into detail on those modes, however, let's look briefly what Emacs has built-in with SGML mode.
Emacs's own SGML mode provides support for entering tags. We covered much of this earlier under HTML mode, so we provide just one brief example here. Inserting, hiding, and showing tags are especially helpful features provided by SGML mode.
Let's look at a chapter on enumerated types by Java in a Nutshell author David Flanagan. This chapter uses the DocBook DTD.
Note that Emacs displays XML on the mode line. XML mode in this context is a subset of SGML mode. Actually, despite this name, all the commands in this mode start with sgml, not xml. The menu of relevant commands is called SGML as well. Emacs doesn't pretend to have extensive XML support.
We want to insert a paragraph before the first paragraph.
Add a blank line following the title and type: C-c C-t |
|
Emacs inserts an open angle bracket and prompts for the tag name (Mac OS X). |
Note that Emacs is not following our indentation style. We can correct it by moving to the beginning of the line and pressing Tab. See Table 8-4 earlier in this chapter for details on SGML mode commands.
The Text Encoding Initiative (TEI) wanted an XML authoring environment for Emacs, so it created (the somewhat misleadingly named) TEI Emacs.[9] Despite its name, TEI Emacs does not include Emacs itself. Rather, it creates an authoring environment for writing XML using nxml mode or psgml mode. It incorporates XSLT tools, along with most of the standard DTDs, such as the three forms of XHTML DTDs (strict, frameset, and transitional), DocBook DTDs, and more. Naturally, the TEI's own DTDs and schemas are also included.
The active development of this tool and its careful packaging led us to describe this tool despite the fact that it is limited to Linux and Windows at this writing.[10] You should have Emacs 21.3 already installed before you install this tool. Installing TEI Emacs is trivial. The Windows version has an installer, and Linux users follow simple instructions at http://www.tei-c.org/Software/tei-emacs/, the web site for downloading TEI Emacs.
James Clark, an XML pioneer, wrote nxml mode to provide Emacs support for his schema standard RELAX NG. For details on the standard, visit http://www.relaxng.org/ or pick up a copy of RELAX NG by Eric van der Vlist (O'Reilly). The important thing about nxml mode is that it validates text as you type instead of making validation and debugging separate steps.
If you did not install TEI Emacs, you can download nxml mode and its schemas from http://thaiopensource.com/download/. If you decide to become an active nxml mode user, you may want to join a related Yahoo Group discussion list (see http://groups.yahoo.com/group/emacs-nxml-mode/).
In this section, we change our running HTML example to XHTML, first using a RELAX NG schema and nxml mode. Open dickens.html, then enter nxml mode.
nxml mode tells you what schema it is using in the minibuffer. It's smart enough to know that its XHTML schema is best for this purpose.
The mode line tells us that this file is currently invalid. Emacs highlights errors with red underscores. Let's deal with these errors one at a time.
Move the cursor to the red underscore at the end of the html tag. |
|
The minibuffer describes what's missing. |
Editing XHTML with a schema requires a namespace definition in the
<html>
tag. nxml mode knows what we need.
This is a good time to use nxml's completion feature
to let it supply the details for us. C-Enter completes the current tag.
The mode line tells us that this file is still invalid. Moving to the
underlined address tag gives us a fairly cryptic reason; it says,
Element not allowed in this context
.
Let's move down to the closing body tag to see if
that error provides any more insight into the problem.
This message provides a clue. Although HTML authors are not accustomed to adding closing tags to paragraphs, XHTML requires them. Let's insert a closing tag after our paragraph.
Note that just typing </ was adequate to insert a closing tag for the current element. We don't need to type C-Enter to invoke completion. That's because in nxml mode, slash is bound to nxml-electric-slash. It automatically completes the nearest open element, another shortcut for us.
A similar command is C-c C-f (for nxml-finish-element). With C-c C-f, you don't have to type anything; it inserts the relevant closing tag for you.
Look at the mode line now. It says valid. Using nxml mode, it's not too tough to take an HTML file and change it to valid XHTML.
Validating text as you type it is a key feature of nxml mode. It's validating against a schema. To specify a different schema, type C-c C-s (for rng-set-schema-and-validate). The minibuffer prompts for the file where the schema resides. A number of schemas can be found online at http://www.relaxng.org/#schemas. You can also convert DTDs to schemas using tools listed on that page.
Your menus vary depending on whether you install nxml mode directly or whether you use TEI's version. TEI provides support for encoded characters using the UniChar menu. It also provides extensive XSLT support. TEI's NXML menu includes some TEI skeletons as well as nxml mode options. Nxml mode installed from thaiopensource.org includes an XML menu with options for setting the schema and customizing the mode. Table 8-7 lists some of the commands available in nxml mode.
Table 8-7. Nxml mode commands
Keystrokes |
Command name |
Action |
---|---|---|
C-Enter |
nxml-complete |
Complete the current tag. |
/ |
nxml-electric-slash |
Add a closing tag for the last open element. |
C-c C-n |
rng-next-error |
Move to the next error. |
C-c C-l |
rng-save-schema-location |
Creates (or updates) a file called schemas.xml in your home directory. This file associates schemas with files. |
C-c C-s |
rng-set-schema-and-validate |
Set the schema and validate against it. |
C-c C-a |
rng-auto-set-schema |
Set the schema automatically according to the contents of the file. |
C-c C-w |
rng-what-schema |
Show in the minibuffer the current schema associated with this file. |
C-c C-v |
rng-validate-mode |
Toggles whether the mode line indicates that the file is valid or invalid. |
C-c C-u |
nxml-insert-named-char |
Insert a named character; press Tab to see a list. |
(none) |
nxml-insert-xml-declaration |
Insert an XML declaration at the beginning of the file. |
C-c Tab |
nxml-balanced-close-start-tag-inline |
Insert the ending tag for the starting tag you are typing, putting the ending tag on the current line. |
C-c C-b |
nxml-balanced-close-start-tag-block |
Insert the ending tag for the starting tag you are typing, putting the ending tag on a separate line. |
C-c C-f |
nxml-finish-element |
Finish the current element. |
M-h |
nxml-mark-paragraph |
Mark the current paragraph. |
M-} |
nxml-forward-paragraph |
Move forward one paragraph. |
M-{ |
nxml-backward-paragraph |
Move back one paragraph. |
C-M-p |
nxml-backward-element |
Move back one element. |
C-M-n |
nxml-forward-element |
Move forward one element. |
C-M-d |
nxml-down-element |
Move down one element (if nested). |
C-M-u |
nxml-backward-up-element |
Move up one element (if nested). |
Lennart Stafflin's psgml mode has been around for a while. It is more robust than Emacs's own SGML mode, but, like any add-on, you have to install it in order to use it. Either install TEI Emacs as described earlier or download psgml mode from http://www.lysator.liu.se/projects/about_psgml.html and follow the installation instructions there. TEI Emacs includes a functioning psgml mode, so if you've installed TEI Emacs, your setup work is done.
psgml mode consists of two parts: sgml-mode for writing SGML and xml-mode for writing XML (and in our case XHTML).
To start psgml mode to edit our XHTML file, type M-x xml-mode. |
|
XML appears on the mode line and an |
The *SGML LOG*
window displays messages about this
session. (If it doesn't appear immediately, click on
the first character in the file.) The log buffer complains that it
could not find an external entity called html. This file has been
changed to work with the XHTML RELAX NG schema. psgml mode expects it
to conform to an XHTML DTD. To get started with the (minimal) work
needed to undertake the transformation from a schema-based file to a
DTD-based file, we ask psgml to normalize the buffer.
Type: M-x sgml-normalize or select Normalize from the Modify menu |
|
psgml mode eliminates the namespace declaration in the
|
More needs to be done, however. The first statements in an XHTML file include an XML statement and a DOCTYPE entry that identifies the DTD this document should be validated against. One of the nice things about TEI Emacs is that it includes a variety of DTDs. (Users of standard psgml mode don't have this feature; sorry.[11])
At the beginning of the file, select DTD→ Insert DTD→ XHTML Transitional. |
|
Emacs inserts the two required elements for us. |
That's all it takes to make this file a well-formed XHTML file. psgml mode allows for validation against the DTD. Let's validate it using C-c C-v to make sure it's okay.
Type: C-c C-v |
|
psgml mode inserts the default validate command in the minibuffer; press Enter to run it. |
Press Enter and type y to save the buffer when prompted |
|
The |
Of course, typical documents are far more complex than this one. Options on the View menu provide selective hiding and showing of elements, including an option to hide all tags, allowing you to focus on the content of the file instead.
psgml mode also offers numerous options. If you are running TEI Emacs, you'll find the File Options and User Options submenus on the XML/SGML menu. If you've installed psgml mode standalone, you'll find them on the SGML menu. Table 8-8 summarizes some of the psgml commands.
Table 8-8. Bindings in psgml mode
Keystrokes |
Command name |
Action |
---|---|---|
C-M-Space |
sgml-mark-element |
Mark the current element. |
M-Tab |
sgml-complete |
Complete the current tag. |
C-M-t |
sgml-transpose-element |
Transpose two elements. |
C-M-h |
sgml-mark-current-element |
Mark the current element. |
C-M-k Modify → Kill Element |
sgml-kill-element |
Delete the current element (and any child elements). |
C-M-u Move → Backward Up Element |
sgml-backward-up-element |
Move up to the parent element for this element. |
C-M-d Move → Down Element |
sgml-down-element |
Move down to the next child element. |
C-M-b Move → Backward Element |
sgml-backward-element |
Move to the previous element. |
C-M-f Move → Forward Element |
sgml-forward-element |
Move to the next element. |
C-M-e Move → End of Element |
sgml-end-of-element |
Move to the end of the current element. |
C-M-a Move → Beginning of Element |
sgml-beginning-of-element |
Move to the beginning of the current element. |
C-c C-w SGML → What Element |
sgml-what-element |
Similar to sgml-position but describes hierarchy in terms of tags versus content (for example, start-tag in title in head in html). |
C-c C-v SGML → Validate |
sgml-validate |
Insert validation command in the minibuffer so you can modify it if necessary before pressing Enter to execute it. |
C-c C-t SGML → List Valid Tags |
sgml-list-valid-tags |
List tags that are valid in the current context. |
C-c C-q Modify → Fill Element |
sgml-fill-element |
Fill element according to the mode's indentation rules. |
C-c C-o Move → Next Trouble Spot |
sgml-next-trouble-spot |
Find the next problem spot and display the problem in the minibuffer. |
C-c C-n Move → Up Element |
sgml-up-element |
Move to the parent element. |
C-c Enter |
sgml-split-element |
Split current element. |
C-c C-l SGML → Show/Hide Warning Log |
sgml-show-or-clear-log |
Display or delete the |
C-c C-k Modify → Kill Markup |
sgml-kill-markup |
Delete current tag. |
C-c / Markup → End Current Element |
sgml-insert-end-tag |
Insert closing tag for current tag. |
C-c - Modify → Untag Element |
sgml-untag-element |
Delete the current tag pair. |
C-c # Modify → Make Character Reference |
sgml-make-character-reference |
Change character under the cursor to the equivalent entity. |
C-c C-f C-e View → Fold Element |
sgml-fold-element |
Hide the current element and its children if any. |
C-c C-u C-e View → Unfold Element |
sgml-unfold-element |
Show the current element and its children if any. |
C-c C-f C-s View → Fold Subelement |
sgml-fold-subelement |
Hide subelements. |
C-c C-f C-r View → Fold Region |
sgml-fold-region |
Hide the region. |
C-c C-u C-a View → Unfold All |
sgml-unfold-all |
Show all hidden tags and text. |
[9] We'd like to thank Emacs guru Eric Pement for pointing out TEI Emacs to Deb.
[10] We sincerely hope that this support will be extended to Mac OS X as well, providing developers and writers on that platform the benefits of this tool's capabilities. Meanwhile, Mac users may want to install nxml mode from http://thaiopensource.com/download/ and psgml mode from http://www.lysator.liu.se/projects/about_psgml.html.
[11] A straightforward introduction to setting up a complete environment for psgml mode can be found at http://openacs.org/doc/openacs-5-0-0/psgml-mode.html.