Composing Good HTML This document attempts to address stylistic points of HTML composition, both at the document and the web level. It is available on the Web at http://www.cs.cmu.edu/~tilt/cgh/ (if you are reading this via a mirror, you may want to check the original to make sure you're seeing an up-to-date version). --------------------------------------------------------------------------- New: This is version 2.0.5; version 1 is still available for those who are interested. Now that Web Weaving is on shelves near you, it seems appropriate for me to get off my duff and feed all of the changes back into this document. See "Some History," below, for more information on what the heck I'm talking about. --------------------------------------------------------------------------- This document is divided into two main sections. The first section discusses the document -- it should be recognizable as the revised version of the original CGH. It discusses good practices to follow in creating your documents, common errors and things to avoid when composing HTML, and finally, a brief treatment style sheets, which provide a mechanism for greater control over how a document is rendered. The second section is brand new -- it discusses style issues regarding your Web as a whole. How it is divided and organized, how it is interlinked and intertwined; these are the issues under consideration here. This is not a beginner's guide; check the "For More Information" section for pointers to more basic works, as well as for more advanced references and tutorials. It is designed for the HTML author who has learned the basics, and is ready to start thinking about the more advanced aspects of Web document design. Note: I'm not finished spiffing up this new version yet, but it's good enough to be presentable, and I'd rather have the information available, rather than have it languish for lack of final polishing. At the very least, I still need to: * Make some of the larger figures into a more manageable size * Provide rendered versions of the HTML examples * Add in some more useful links to other resources (suggestions appreciated!) * Break this into single and multipart versions by preparing multiview source documents Unfortunately, the life of grad student is not all cheese and wine (very little of it, in fact), so these will have to come at a later date. Besides, with the publication of Web Weaving (see the History section below for background), it seems an appropriate time to also re-update this document, so I won't let a little thing like a busy schedule stand in my way. Some History I wrote the first version of "Composing Good HTML" in January of 1994. At this point, the Web was just starting to explode, and Mosaic was the browser on the tip of everyone's mouse. Being one of the strange few who used Lynx as well as Mosaic (as well as Emacs-W3, when I was feeling cocky), I noticed that different browsers dealt with incorrect usage of HTML with varying degrees of success. When I pointed this out, the solution suggested to me was to write a "lint" for HTML that would point out common errors in documents. In preparation for this, I started making a list of common errors, and turned that list into a human-readable document. That document became "Composing Good HTML." About that time the semester started, so I made the document publicly available, and asked for comments and criticism. I got both, in spades! I corrected errors (including a plethora of spelling and grammatical errors), added some new sections, and revised pieces of existing sections. But, all in all, CGH didn't really change much, even though things like Netscape and HTML 3.0 (let alone Java and VRML) have snuck up in the meantime. In January of 1995, Carl Steadman, Tyler Jones, and I got together with the idea of writing a book about the Web (this was before the current explosion of the market, so you'll pardon our naivete). Rather than writing a book about HTML, we decided to write a book about creating and maintaining an entire site -- including the stylistic points in CGH as a starting point. The book is called Web Weaving, and it appeared on bookshelves on December 18th, 1995. The book is published by Addison-Wesley. The side effect of all of this is that it gave me a reason to revise CGH to reflect current practices for inclusion in Web Weaving. And now that we've finally finished our book, this also means that the changes in CGH are getting fed right back into the online version. Which, I'm proud to say, is still freely available (and better than ever, I'd like to think). What you see here is, by and large, Chapters 11 and 12 from Web Weaving, edited it so that they stand alone better. While I'd certainly recommend you read Web Weaving for a full treatment on all the issues involved in building and maintaining your Web site (and because every author hopes that his words will be read), Composing Good HTML remains (I hope!) a useful resource for HTML authors (and now Web designers) who want a slightly more sophisticated treatment of the stylistic issues involved in, well, weaving your web. I never did get around to writing that "lint" program, though. Document Style Considerations The World Wide Web has been a wildly successful experiment. It has filled a need for both information users and for information providers: a tool which allows information to be deployed to a wide variety of people over wide geographic distances, regardless of what kind of computer they may be running. All that is required to publish information is any one of a number of Web servers, and all that is required to view that information is any one of a number of Web clients. This is both an opportunity and a challenge. This document discusses the ways in which you construct your markup so that it is readable and usable for a wide range of browsers. HTML provides a device-independent way of describing information. The elements of HTML describe what your information is, not how it should be displayed. This is a subtle point, and perhaps the most important one presented here. HTML will let you describe this piece of information as a header, or that piece of information as an address. It will not let you describe this text as being in 24-point Helvetica, right justified. Your challenge is to provide professional page layout and design without using the traditional tools of professional page layout and design. Sound like a paradox? Not really. All it involves is a bit of trust. The trust you must have can be summarized by the following rule: * if you mark up a document so that your information is labeled as what it is instead of as how it should be displayed, * then browsers will render it in a way that is appropriate and professional-looking. With the current diversity of clients for the Web (and we can only expect to see more), it has become important to write HTML that will look good on any client, and not just on the specific client which the author may have access to. You must trust your markup. There is no way to anticipate how every browser will (differently) render your HTML. If you follow this rule you will get the best possible rendering with all browsers, instead of for just one browser. To this end, there are a few solutions. One approach is software based -- a "lint"-like program for catching semantic errors in HTML, and perhaps even correcting them. Two good examples of this are WebTech's HTML Validation Service and WebLint. Another approach is the one taken by this document -- a style guide which points out common errors one might make in the composition of HTML, and recommending good practices to follow. Bear in mind when following these guidelines that your document may not end up looking the best it possibly can on a particular browser. However, it also will not look ugly on any browser, which is the risk you take by disregarding these recommendations and tweaking your markup code for, say, Netscape. Unfortunately, Netscape may render things differently from Lynx which may render things differently from Mosaic, and so on and so forth -- and even within a particular browser, a user may have chosen font or style preferences different from the ones which you might assume. What these guidelines should do, if followed, is make for a better presentation for the most browsers (instead of the best presentation for only one) -- and ensure that your documents reach the widest audience possible. Good Practices Things contained in this section are good practices for the generation of any HTML document. Specifically, this would include anything which should routinely be done in the creation of documents for the benefit of both reader and author. How to Use Non-Standard HTML There are at least three major flavors of HTML currently in practice as this is being written: HTML 2.0, HTML 3.0, and the Netscape extensions to HTML 2.0. HTML 2.0 is the closest thing to current practice that is available, and can be assumed to be "safe" for all browsers. On the other hand, the HTML 3.0 and the Netscape extensions are not widely implemented, let alone standardized. Under most circumstances, this would be a good reason not to use them until they were more widely available, but there is the mitigating circumstance that all of the Netscape extensions (and some of HTML 3.0, most notably tables) are supported by one of the most popular Web browsers ... Netscape! What should be done about this? Many Web authors take the approach that, since most people use Netscape, it's acceptable to use the Netscape elements, even if it is to the detriment of people using other browsers. Others take the approach that nothing more than HTML 2.0 should ever be used, which means that any benefit which might be derived from these enhancements is lost. The best road is a middle approach. Two good rules of thumb are: * If two or more popular browsers support the extension, it's probably fine to use. For instance, both Netscape and Mosaic (and Arena) now support tables, so any tables you use will be available to most of your audience. * If the extension is not widely supported, but it will not adversely affect your document if it is missing, it's probably fine to use. For instance, the FONT element changes the font size of text in the Netscape Navigator, but not in any other client. However, other clients will simply ignore tags they do not understand-so the text in the FONT element will still be readable. On the other hand, if the MATH element is ignored by a browser, the browser will display gibberish. In general, try to think about the effect that the non-standard elements will have if they are not recognized. These elements can be used intelligently, and on browsers that recognize them, can dramatically enhance the presentation of your page. If it is not possible to use the elements in such a way that rendering is still good on all clients, think about providing multiple copies of the document (for instance, providing a version of the table using the PRE element), and possibly using content-negotiation on the server to provide the reader with the correct version of the document. A final thought on the subject: try to avoid banners in your document that claim that your document is "Enhanced for Netscape" or "Enhanced for HTML 3.0" (or the rapidly more prevalent "Enhanced for Microsoft's Internet Explorer." Ugh.) Rather, try to build your document so that if a reader reads it in (for example) Netscape, it will be obvious that it uses the new elements to good effect ... and if a reader reads it in another browser, they can remain blissfully unaware of what they cannot see, and still be impressed by what they do see. (Opinion Alert: a general comment, that may or may not place me on Bill Gates' hit list -- while I have a healthy disregard for the cavalier attitude in which most "extensions" are made de facto by overwhelming will of places like Netscape, I still have a healthy respect for those extensions which attempt to solve an important problem in a useful way. Many of the Netscape extensions, especially those involving tables, fit this bill, and while they did also provide many duds as well, they have also supported the valid HTML 3.0 alternatives that mirror their extensions. However, in my opinion, every single one of the "Microsoft extensions" is of dubious merit, and of certain incompatibility with any evolving HTML 3.0 specification. Given the well developed state of HTML 3.0, introducing new and incompatible methods of doing the same thing is irresponsible at the least. I highly recommend simply disregarding the extensions introduced with Internet Explorer. Please note that I have the highest respect for many of Microsoft's products; I even used Word and Internet Assistant to compose this edition of this document [although I edited the HTML afterward]. And, dear reader, this paragraph in particular is highly opinion-ridden, so you must take it with a grain of salt as you see fit. On with the useful stuff:) Signing and time-stamping documents One problem which faces anyone trying to find information using the Internet is the question of "authoritativeness." The relative ease with which WWW servers can be set up and populated with information means that the traditional checks of the publishing process can not act to filter out information which is inaccurate or misleading. In addition, it can often be hard to tell how current information found online is, or how actively it is maintained and updated. One thing which you can do to assist Web users is to sign and date all documents in your infostructure, so that people viewing the documents can form some impression of the authority of the document (i.e., how recent it is, and how reliable the information provider is). This is not a complete solution, but it is a large step forward. For example:
served as a paragraph separator, not as an end-of-paragraph; a confusion which originally prompted this document. However, more recent version of the HTML 2.0 and later specifications have changed this behavior. The current recommended use of the P element is to be placed at the beginning of paragraphs; for example:
In this paragraph, our hero discovers that he really likes baloney sandwiches. He also listens to some disco, and has a lovely beverage. Ah, if only all paragraphs were this exciting! This is in contrast to previous usage, where the
was usually placed at the end of the paragraph. Still, in certain contexts, use of
should be avoided, such as directly before any other element which already implies a paragraph break. To wit, the
element should not be placed before the headings, HR, ADDRESS, BLOCKQUOTE, or PRE. It should also not be placed immediately before a list element of any stripe. That is, a
should not be used to mark the end-of-text for
in order to fix white space problems, please think twice and avoid it if you can. Also, when using the glossary list (DL), please try to avoid using multiple DDs (definitions of terms) in order to provide multiple entries for a term (DT). Instead, use a
tag between paragraphs in a definition. All clear now? Character and entity reference errors Simply put, a character reference and an entity reference are ways to represent information that might otherwise be interpreted as a markup tag. For example, consider the rendered HTML document in figure 17. --------------------------------------------------------------------------- [Figure 17: Properly escaping character entities (Arena)] --------------------------------------------------------------------------- The source which produces this document, which uses entities, looks like: In order to represent the "<P>" in this text, I had to use <P> in my raw HTML. In this example, the < becomes "<", the > becomes ">", the " becomes a quotation mark, and the & becomes "&" (which is needed in order to represent the text < in the document without the text being turned into "<"). There are currently four entities for this purpose in HTML, as well as several entities which allow encoding of the ISO Latin-1 Character Set. The most common error in the use of entities is to leave off the trailing semicolon. Also, no additional spaces are needed before or after the entity/character reference. Here are some examples of incorrect usage: Doug & Chris went out for a walk. A paragraph break can be represented with "e; < P > "e; Can you spot the errors in the above examples? They are: * In the first line, "&" needs to have a semicolon after it. * In the third line, ""e;" should be """ (this is subtle and annoying, much like the Unix system call, creat()) * There should be no spaces in the third line, which should read: "<P>". URL errors Another misunderstood aspect of Web document composition is in the creation of URLs. Directory reference errors One grey area involves references to directories. It is possible to request an index of a directory from an HTTP server. The typical response from the server is to either return a pre-generated index document (which is often the document "index.html" in the referenced directory), or to construct an HTML document on the fly which contains a listing of all files in the directory. However, when making such a directory reference, it is important to make sure to have a trailing slash on the URL. That is, if you were to request the index of Willamette University's directory of HTML documentation, you would want to refer to it as http://www.willamette.edu/html-composition/, not as http://www.willamette.edu/html-composition. Many servers are able to catch these errors, and provide redirection to the proper URL, but it's best to get the URL right in the first place -- notably because not all browsers support transparent redirection. Also, getting this correct the first time means it will take less time for the page to be loaded; your readers won't have to wait through the time needed to open two (or more) HTTP connections. Not using fully qualified domain names Problems can arise when the hostnames in URLs aren't fully qualified. Within a local network, a machine can often be simply referred to by its host name. For example, the domain miskatonic.edu might have in it a WWW server with the host name www. Readers within that domain can refer to the machine by this name. However, the server's fully qualified domain name is www.miskatonic.edu. This fully qualified domain name provides enough information that any host, anywhere on the Internet, can find this particular machine. What happens is that an HTML author might construct a link that looks like this: Metanoia -- A Change In Spirit which produces a link to "Metanoia-A Change In Spirit" that will only work for people in the local network which that machine is on. A correct link would look like this, instead: Metanoia -- A Change In Spirit which would allow all of the readers who are interested in Metanoia -- even those living in Freedonia -- to actually follow the link. Along those same lines, be careful in using URLs of the scheme "file:". It's possible to have a reference to file://localhost/some/file/pathname. What this does is references the file described on the local host of whoever is browsing the document. Which is why a reference to the message of the day will display the message of the day on your machine, not the message of the day on my machine. However, this makes several assumptions about your reader's local machine and network which you probably shouldn't be making. Unless you know what you are doing (and probably even then), references of this type will really mess up your Web. Missing quotes in start tags One common error, especially with the current lack of widely available and useful authoring tools, is to leave off a quote in the attributes of tags. For example, this reference to the euphonium, king of instruments, should look like: but people composing "raw" HTML from a text editor will often instead type CZeCh THIZ 0uT ! would be rendered in Netscape as shown in figure 18, and in Lynx as shown in figure 19. --------------------------------------------------------------------------- [Figure 18: Improper use of whitespace (and spelling and punctuation, too) (Netscape)] [Figure 19: Improper use of whitespace (Lynx)] --------------------------------------------------------------------------- On some browsers, there may be white space around the anchor, which adds unwanted unsightliness to the rendering, and may lessen the impact of the document. (This comment really applies to white space immediately following start tags, and immediately preceding end tags.) Stylesheets The point has probably been well made by now that HTML is not a very good vehicle for providing specific information about layout and presentation. There are no mechanisms for an author to specify how she wants specific elements rendered, or to control aspects of page layout. While one of the strengths of HTML is this very independence from presentation details, it has become clear that some form of presentation control is needed. Stylesheets are the answer to this problem. It provides the other half of the equation, the half that is currently not provided by HTML. While HTML provides information about content, stylesheets will provide information about how to render specific elements. Unfortunately, while several mechanisms for providing stylesheets are under development, there is no clear standard at the time of this writing. We cannot tell you what stylesheet mechanism(s) will become standard, but we can tell you about the current contenders. Keep your hopes up, though: because of the importance of stylesheets, it is highly likely that a usable standard will emerge within the next year. Some Stylesheet Proposals In these proposals, the stylesheets contain information about how elements should be rendered, whether this is font information, justification information, etc. At the time of this writing, the syntax for these stylesheets has not yet been fully designed. Arena/Cascading Style Sheets The Arena browser is currently the only browser which supports a stylesheet mechanism, and that mechanism is currently only very limited and very experimental. The mechanism involves "cascading style sheets," which means that the several different style sheets, each with a different order of importance, are combined in order of importance to create a presentation style. The reader can specify her own preferences for rendering, as can document authors, and these preferences are merged to produce the final document. DSSSL/DSSSL Lite DSSSL is the Document Style Semantics and Specification Language, which has emerged from the SGML community as a potential stylesheet mechanism. Because it is complex, work is being done to create "DSSSL Lite," a modified subset of DSSSL which can be easily implemented by client programmers, and easily used by HTML authors. Alternatives to Stylesheets While stylesheets are not currently useable, there are alternatives in existing specifications, which can be used with existing browsers. While the HTML 3.0 enhancements below are not yet widely propagated, it is likely that they will be soon; and the Netscape enhancements are already available (and are likely to be integrated into the evolving HTML 3.0 specification). HTML 3.0 While HTML 3.0 does include the STYLE element for supporting whatever mechanism is eventually deployed for stylesheets, HTML 3.0 also provides some new elements for greater control over presentation. These elements include BANNER, BIG, SMALL, TABLE, MATH, and TAB. The BANNER element provides a means for a banner of HTML that will always remain on the screen. This might be a copyright notice, a toolbar, or any other content which should always be available. The BIG and SMALL elements allow for rendering text as bigger or smaller, as compared to the default text size. The TABLE and MATH elements provide for a more sophisticated means of layout. The TABLE element allows the author to specify a spreadsheet-style arrangement, with cells that can contain text, images, and even input elements for FORMs. The MATH element allows for the description and rendering of complex mathematical formulae. The TAB element allows the author to specify tab stops within the document. In addition, some entities have been added, such as "&emspace;", to provide finer control over spacing. For more information about these additional elements and entities, see the HTML 3.0 specification. Netscape The Netscape approach cannot be called a "style sheet," per se. Rather, as of the 1.1 release of Netscape Navigator, Netscape has provided several "enhanced" elements to help control presentation. These elements include FONT, BASEFONT, IMG, and BODY. The FONT and BASEFONT elements allow changing the size of font within a document. The IMG element, on the other hand, has been enhanced to provide text flow around images in documents. The BODY element now allows control over the background. The author is allowed to provide a background color or image for their document. In addition, the author can specify different colors for hypertext links, in case the default colors do not have sufficient contrast to the new background color. If you would like more information, Netscape Communications has provided documentation of their HTML extensions online (both for the Netscape HTML 2.0 extensions and the Netscape HTML 3.0 extensions). Note: Be careful when changing colors for hypertext links. Most browsers take the approach of using a bright color (such as bright blue), which has high contrast to the default page background, for links which have not yet been followed; and of using a dull color (such as dark blue), which has less contrast to the default page background, for links which have already been followed. Readers have become used to this high-contrast/low-contrast visual cue, and changing the link colors can confuse readers. The best approach is to, first, not change the link colors unless you have to. With most background colors, the defaults should still be fine. If you do need to change the link colors, use a color that is bright, and high-contrast to the background color, for links to pages which have not yet been visited. Use a duller version of that same color for links that have already been followed. Netscape Frames Given the proliferation of Netscape's frames, it seems appropriate to at least add in a paragraph or so commenting on proper usage. Frames allow you to break the browser's window into separate subwindows, with different documents in different windows. This provides even greater control for the author in terms of what the end document actually looks like (and, granted, can be used to very good effect), but, as with all things, must be used with care. Some gotchas with frames include: Navigational This has more to do with Netscape's current implementation, but may be more fundamentally related with the issues involved in providing frame-style mechanisms. Currently, when a reader encounters a space structured with frames, any further navigation they do does not make it onto the history stack. This means that the next time they hit the "back" arrow, they pop right out of the entire space, possibly going back several link selections. This can be jarring, to say the least. What this boils down to is that you must be even more careful to prepare a good navigational structure for your corpus of documents. (In fairness, Netscape has recognized the frame problem, and the 3.x version of Navigator addresses it.) Layout Many sites have poorly layed-out frames; when a reader with a browser window of unexpected shape or size shows up, some of the frames are not completely readable. I don't understand enough about frames to know why this happens, yet, so all I can do is to warn you to watch out. In general, the gotchas revolve around the fact that more control is removed from the reader in a medium where the reader expects to have a good deal of control. This doesn't mean don't use frames; it means that you must carefully analyze why you are using them, and make sure that their use is justified. Another note: there is a NOFRAME element which can be used to give alternate text for those browsers which do not support frames; use it. More on this subject as I become more familiar with frames. Web Style Considerations A quick plug: Chapter 5 of Web Weaving discusses many of the issues you should take into account in planning and administering your Web (in fact, the entire book revolves around the subject in great detail). Here we will also address that subject, considering the architecture of your infostructure. Organization When organizing your infostructure, there are several important issues to consider. These issues include: Presenting a clear ordering of information by subject (table of contents), or some other form of reasonable entry into the infostructure. Some useful forms are: * Table of Contents * Searchable Index * What's New (with the organic nature of online documents, a time-oriented ordering will help the infonaut quickly orient herself with what is new and/or changed in otherwise familiar territory) The reader needs to be able to find what they are looking for, and a good overview that allows the reader to quickly find a particular topic or document is invaluable. Only making a document as long as it needs to be. If a document can be logically decomposed into more then one file, do so, but only decompose a document if the narrative branches from the linear structure of the current document. An example of this is breaking a book-length work up into chapters, and further breaking those chapters up into sections. Because of the length of time involved in retrieving documents, making the document available in readable chunks means that the reader can use the information without becoming overwhelmed in loading times and a correspondingly large amounts of information presented a single, huge, scrolling document. Correspondingly, make sure a document is richly cross-referenced, so that if reader wants to ask, "Why?", she can. If you can split up supplementary information into separate documents, do so. This allows the reader to follow a main flow of narrative, but still able to look up evidence and additional related stories and information as necessary. But don't put in so many links that the reader gets lost trying to follow them all. Providing a clear, consistent navigation structure. You should always be able to easily to navigate to all documents which immediately relate, but you should also always be able to get any other document in the infostructure with a minimum of fuss. Always provide access to the original table of contents, or its equivalent. This is especially important for when others create links to documents in your Web, but do not necessarily create links to your main entry points; readers can find themselves in the middle of what is obviously a larger document, but without any means of finding additional information. See Main Roads and Scenic Paths, below. Design Goals Importance of content Anyone working with HTML for any length of time will soon realize that the markup language is composed of containers, which label content. It should be obvious, then, that your web should be primarily about this content, whatever it may be. That's not to say that content only lies between HTML tags: content is also found in other media types, of course, and, depending upon the type of information you provide, sounds or images may be more important to both you and your readers than other types of media. Web sites, however, should be driven by content, not by vanity or the need or desire to make a buck. Whatever your background, you have real "content" -- information, discussion, narrative, ideas -- to publish on the Web. People will visit your site to find this content. Provide it. Focus your site around it. The largest threat to the Web is that as it becomes insanely popular, instead of becoming a world-wide information repository, as its founders and proponents have hoped, it becomes a large intertwined mass of self-referential sites unwittingly involved in meta-discussions on the nature of the Web: home pages which say little more than "This is my home page" (or "our home page", in the case of the corporate or organizational "presence"), with a collection of links which (virtually) point to the same collections of sites as the last page you visited did. Main Roads and Scenic Paths: Issues of Navigability As readers attempt to sail the seas of your infostructure, it is important that you provide useful ways for them to move around in your infostructure. Many readers complain about the proliferation of links in documents, providing so many choices that it becomes impossible to decide where to go next. The blessings of hypertext -- leaving control in the hands of the reader -- can also be a curse, as the original thrust of the narrative becomes awash in side tracks and dead ends. A means of approaching this problem is to use the metaphor of "main roads" and "scenic paths." This means categorizing the kinds of links you include into two major groups: those which are recommended next destinations, and those which lead off into explanatory side-trails and divergences. As an example, a main path through a hypertext version of a book would be a linear progression from first chapter to the last. A side trail, on the other hand, would be a reference from (for example) Chapter 6's description of CGI functionality in various HTTP servers to Chapter 8's extended discussion of CGI scripting. This is not to say that there is a single main path through a document -- there can be several (just as there are several ways to read a book, including as a linear narrative, and as a random-access reference). And side trails include references outside of the immediate document, such as bibliographic references. In addition, side trails can become main paths if the trail leads to another document instead of self-contained explanation. The point, however, is that a document (in the extended sense of several HTML pages collected and interlinked) should contain at least one or more author-defined main paths through the text, in order to provide a guidepost for those exploring the information. These main paths should take the form of "next" and "previous" anchors, links back to the table of contents and index from any point within the document, and pointers to alternate main paths which are available (where appropriate). Although hypertext is based on notions of non-linear text, readers do make it linear as they read through it. And it doesn't hurt to provide at least one sensible linear pathway through the document for readers who aren't interested in wandering around in hyperspace. Consistency Consistency is what brings your site together so that it feels like a cohesive whole -- it can unite otherwise disparate topics or content areas, and it can be used to give your site a distinctive feel in comparison to other sites, or a sense of personality. Consistency also lends to the maintenance of a site -- if you have a certain way of doing things site-wide, it becomes much easier to make significant site-wide changes without putting a great deal of time into it. You can achieve site-wide consistency a number of ways: Headers and footers A standard site-wide graphical banner or text-based header can be used to easily identify the site or sponsoring organization. Your header doesn't necessarily need to be static across the site; you can easily share dimensions and a primary graphic element across banners while making each one relate specifically to the content at hand. Footers can be used in the same way; a standard method to sign documents and/or a standard text-based or graphical menu bar can easily pull a site together, not only as a design element, but also as an easy way to always navigate to the table of contents or index of a site. Server-side includes, supported by most HTTP servers, can simplify some of this work, allowing you to create generic headers and footers which can be modified once and included in all of your documents. Graphic elements A unifying theme for graphic elements throughout the site easily pulls it together into a whole. A shared motif, such as bubbles, sign posts, or a corporate logo, works, as does a site-wide color scheme or page backgrounds. You can rely on sizing and positioning of graphic elements or textual elements, as well, to achieve a unified feel. Personality and style Beyond images and design elements, sites come together because of personality and style. A consistent feel or attitude for a site, conveyed across textual and graphic elements, can not only make each piece feel as if it's part of a larger whole, it can also attract readers who share the same attitude or outlook (or are fascinated by yours). The best sites on the Web aren't necessarily the most polished, but those that pull readers back again and again not only because of informational content but also because of the voice with which that content is presented. For documents which should have a personality all their own, such as user home pages, you can still pull all these different personalities and outlooks together by presenting a common theme or launching point. All the users of a particular Internet service provider, for example, have something in common by the sheer fact of their being there -- and by the mere fact of providing a top page view to user-maintained areas, the service provider has begun to form a community around which a commonality can develop. Persistent URLs Although Universal Resource Names, or URNs, are being developed in order to provide a naming system similar to the domain naming system for URLs, at this point it remains desirable to use URLs as if they refer to the same resource persistently through time. As a content provider, you can help provide those who make links which point to your site by developing a file structure which will allow you to manage content as it grows and develops. If your Web space is based on a hierarchical filing system, you can avoid major reorganization of that file system by * thinking not only about organizing your current content, but how you plan on developing and expanding that content in the future * creating a file space which is neither too shallow nor too deep for your content. An example might be an organization which has just created a new division, Foobar. Currently, there's little information to publish about Foobar on the Web: Foobar has a mission statement and little else. Though it might logically follow to create a file, "foobar.html", to hold the mission statement, and to store it in the same directory as your main organization's web, it might be wiser to create a subdirectory named foobar which could then contain foobar.html and other files, as Foobar expands. This way, links don't have to be changed or redirected down the road when Foobar adds additional files and perhaps chooses to design and administer its own web space. If part of Foobar's mission statement is to spin off into its own organization, you might even create a directory on the same level as the parent organization's, to signify within the URL path the relative autonomy of the division and its future direction. Another way to manage URLs is to only publicize a few well-known entry points to your Web: for example, the top view, or table of contents page, and perhaps an index page, or a FAQ page. When URLs do change, it's important that you not only provide links from the old URLs to the new ones (or redirect the URLs to the new ones), but you also make an attempt to notify those that have links into your Web space, through general announcements or by contacting directly those who have well-known links to your documents (such as Yahoo or Lycos). Seamlessness Your web space should not only be consistent with itself internally, it should make references between the site and the outside world appear seamless. A good case in point is the corporate site which has made its product information available via the Web, but, under the link for Ordering Information, only provides an 800 number in order to purchase the advertised commodity. Or the home page for a band which doesn't provide any audio clips of the band's songs, but just a thumbnail image of the cover art from their most recent album, available through some obscure indie label. Or the online newspaper which provides news coverage, but doesn't push the envelope and provide a real way to participate in the political process. Seamlessness is about bridging the gap between the world you create within your web and the world outside it. Often, this means not carrying over from traditional broadcast media restrictions or limitations that fail to make sense in interactive media. Macrocosms and Microcosms The big picture: entire server structure A site-wide strategy to organize information is never easy to invent, but vitally important to your site's success as a place where information is retrieved and used, versus simply being an area in which content is stored. Finding a metaphor Of course, there's no single recipe or structuring mechanism which you can apply to all types of content to give you a well-designed web site. That comes from thinking about the nature of your site and your content, and the logical divisions that your content can be organized around. However, finding an existing metaphor which you can work within while also pushing the boundaries of can be an effective way to plan for the organization of a site. There are many obvious metaphors upon which to base a web site: thinking of your content as being organized like a book, building, or branching tree. The book metaphor: pages of content Books lend themselves easily to the Web: and, in fact, many books have been "ported" to the Web, for better and for worse. Books have tables of contents and indices, for quickly locating information; parts, chapters, sections, and sub-sections, for organizing content; and footnotes, endnotes, and bibliographies, for displaying links to other content. Collections of books become "libraries", complete with card catalogs and help desks. However, books also have pages which display content statically, while computer displays have a single, dynamic screen. A book metaphor quickly falls apart when applied to the Web on a page level: you could choose to consider a single HTML document a "page", causing you to break up content into arbitrarily small and hard to manage, difficult to navigate pieces; or you could think of whatever text and graphics being currently displayed on a screen as a "page", which could easily drown the user in a sea of text without the benefit of traditional navigational tools such as page breaks and numbering of pages. The screen is not a page. The building metaphor: content as artifice Sites can also be managed as being housed in a building, a collection of buildings, or along some other spatial metaphor. The information you hope to store and manage is divided for the user along content areas, which is housed in different "buildings", which can then be further subdivided into "rooms". Obviously, this can be effective for some types of content, such as a large corporate site with many divisions, or a museum or gallery: basically, any information which can be mapped into a spatial plane consistently lends itself to this sort of view. At the same time, a spatial metaphor in a largely text-driven medium, as the Web is today, is often hard to pull off convincingly. VRML (Virtual Reality Markup Language) and other such developments will allow for the creation of virtual spaces; even then, the connecting points between rooms or buildings -- hallways and walkways -- need to be considered thoughtfully. It's also the case that, at many sites, the metaphor is dropped too quickly: you're asked to select a content area based upon a clickable map-based view, but then you're dropped into pages of descriptive text. Not only can this be disconcerting for a user, it points out the fact that oftentimes resources aren't allocated wisely across a Web site, with too much attention and time spent on the top page of a site in comparison to the remainder of the site. The branching metaphor: regimented growth A third way of thinking about a site as a whole is using a branching metaphor, where all content springs from a common root and then branches out into many divisions and content areas. This is an obvious metaphor to use for web sites built atop file systems, since most file systems share this organization of directories (or folders) branching into subdirectories (or subfolders), and so on. A branching metaphor shouldn't be pursued over the linear flow of information, however: too many branches can be confusing or frustrating for a user, especially if navigating those branches requires repeated jumps to a monolithic top structure. In general, there some key issues you should keep in mind when organizing a site on a macro level, including: Providing a main entry point, or top view, which makes it easy for users to find the content which they're most interested in. At times, you'll know exactly what a user is looking for: if you run a site which provides audio clips of theme songs from popular cartoon series of the '70s, users probably expect to find a listing of available audio samples or a link to such a listing from your site's top page. Other times, you can't be expected to know: for a site covering a wide diversity of subjects, it may be necessary to provide a search mechanism or user-customizable top view in order for users to navigate your site comfortably. Offering multiple paths to the same content. Not all readers seek the same information in the same way. A good glossary or index will cross-reference information: for example, you may be told to look under "automobiles" if you seek information under "cars". That same information could probably be found by looking through a table of contents. With hypertext links, you can refer to the same information in many ways. Do so, where it facilitates the user without overwhelming her. Keep in mind, too, that a site, whether it be a file system or a database, need not be organized as the user sees it: the underlying structure doesn't have to be identical to the structure which the user navigates. However, a close relationship between the two can make it easier to maintain a site, as content is revised and expanded. A change in one part of your web space can have an impact on other parts of your site which share links or other references: the easier it is for you to see these relationships while maintaining these underlying documents, the more likely it becomes that your site as a whole is kept up-to-date and cohesive. The little picture: a document corpus Many of the decisions you make on a site-wide level to organize content carry over to the management of "documents", whether they be single pages of HTML, or a collection of such pages which cover a single topic. These things include such obvious carry-overs as having an overview of the information presented within the document available to the reader at the "top" page, or expected entry point; making links available at appropriate points (usually, at the tops or the bottoms of pages) to bring the reader back to the overview for the document; and keeping your collection of documents uniform in terms of both content and form. Much of the management of documents, though, is the management of links. Hypertext is all about links -- this should be patently obvious to most. But producing hypertext is all about managing links from the perspective of your potential reader. Too often, Web documents fail by failing to manage links effectively -- either by delivering screenfulls and screenfulls of ever-scrolling text, or providing index-card-sized groupings of hypertext which link in a myriad of directions to other index-card-sized groupings of hypertext. Neither end of the spectrum allows the user to navigate the content presented easily: in one case, one becomes disoriented in a sea of text; in the other, in an ocean of links. Worse yet, documents can become so overseasoned with random and senseless connections to every possible place that that the reader becomes lost in a sea of text and links! The key to managing links in your documents (besides simply verifying that they are correct) is to organize them into classifications, and to employ links of various classifications in a reasonable and intelligent way. The next few sections describe some of the various classifications of links. Footnotes There are two traditional purposes for footnotes: for bibliographic references, and for further commentary and/or elaboration of points within the main text. Links to short explanatory text within a hypertext document can be useful to readers, if it's clear from context that the link is a digression. Within your documents, the "footnote" style of link should be regarded as an explanatory link which elaborates on the current discussion without drawing the reader away from the main text. A footnote will draw the reader away temporarily, explain something, and then allow the reader to return to the main flow of text. While a footnote might offer further links to further explanations of greater depth, the footnote itself is usually nothing more than a brief explanation or glossary-style definition. You can achieve this effect by context, by linking from a phrase (as in the lemming example below) to a short explanation or parenthetical remark that explains the text in question. If you are to trying to achieve a more traditional effect, you can also use numbered note references, by either using a number surrounded by brackets ([1]), or by using the SUP element in HTML 3 (1). HTML 3 also defines the FN element for use in footnotes, which, "when practical, [should be] rendered as pop-up notes":
Nothing is certain about the lemmings,
other than that they left as they came, with nothing but a silly grin and
some lemon pies.