Previous Page TOC Next Page



1


Before You Start: Issues to Consider


by William Robert Stanek

The World Wide Web is rapidly evolving into a medium that rivals television for information content and entertainment value. Millions of people and thousands of businesses around the world are racing to get connected to the global Internet and the World Wide Web because the Web is the most powerful and least expensive medium to publish in. Whether you are an information provider or simply a creative person who wants to publish your own work, you will find that there is no other medium that empowers the individual like the Web. The Web levels the playing field, allowing a one-person operation to compete head-to-head with corporate conglomerates that employ thousands of people.

To publish successfully on the Web, you do not have to be a genius, a programmer, or a computer guru with insider secrets. What you need are the practical advice, tips, and techniques you will find throughout this book. Many books on Internet and Web publishing discuss theories, cover key subjects, and show basic examples, but rarely follow a practical approach to Web publishing. Books without practical examples and genuinely useful information can leave you wondering where to start, how to start, and what to do when you do finally manage to start. The chapters are filled with useful information designed to unleash the topic of Web publishing with FrontPage and help you become one of the thousands of successful Web publishers providing ideas, services, and products to millions of Web users.

This chapter provides an overview of some basic information you need to know and decisions you need to make before you start publishing on the Web.

What Is FrontPage, and Why Do You Need It?


Web technologies are growing at a phenomenal rate. When once there was only HyperText Markup Language (HTML), now there is HTML 3.2, HTML extensions for Netscape and Microsoft Explorer, VRML, CGI, Java, ActiveX, and much more. Keeping pace with this ever growing array of technology is confusing. Enter FrontPage, an integrated what-you-see-is-what-you-get (WYSIWYG) tool for creating and managing Web sites that is years ahead of the competition.

Microsoft FrontPage is much more than an easy-to-use tool for creating and managing Web sites. It will soon be the authoring and management tool of choice for thousands of current Web publishers, thousands of businesses large and small seeking to set up intranets, and thousands of people around the world who thought they would never be able to establish a presence on the Web due to the complexity of creating and managing a world-class Web site.

Millions of Microsoft Office users around the world will find that FrontPage utilizes the very familiar Microsoft interface. In fact, FrontPage is the only Web site creation and management tool that is a member of the Microsoft Office family of applications. All tools included in FrontPage are also part of one complete package. You will find that it is surprisingly easy to become a Web publisher using FrontPage!

WYSIWYG Creation of Advanced HTML Documents


FrontPage includes two powerful tools for creating advanced HTML documents using a WYSIWYG authoring environment: The FrontPage Explorer and the FrontPage Editor. Not only does the FrontPage WYSIWYG authoring environment display documents in a style that mirrors the style of the actual published documents, but the FrontPage Explorer and FrontPage Editor are also very easy to use.

The FrontPage Explorer, as shown in Figure 1.1, presents your Web site in a manner similar to the Windows 95 Explorer and simplifies Web site creation and maintenance, particularly for complex sites. The FrontPage Explorer gives you multiple ways to view your site. The two primary views are shown in Figure 1.1.

Figure 1.1. The FrontPage Explorer: a hot graphical display tool.

As you can see, Outline View provides a hierarchical representation of your Web site with icons that indicate the different kinds of pages in your Web. You can expand the view to show all links to images and pages, or collapse the view for a higher-level picture. Link View shows a graphical representation of your Web site that includes icons and titles with arrows between pages indicating the direction of each link. A different type of view, called Summary View, displays a list of all pages, images, and other files used in your Web site. You can sort this list on title, author, modification date, creation date, and any comments you've added.

Another FrontPage Explorer feature is the drag-and-drop interface, which lets you create a link by simply dragging a page or image icon to a specific place on a page in the FrontPage Editor. With the FrontPage Editor (shown in Figure 1.2), you can create your Web pages in a fully WYSIWYG environment. Because FrontPage supports all standard file formats and protocols, you can link to any file, such as MPEG or PDF, as well as link to any FTP site, Gopher site, or newsgroup.

Figure 1.2. The FrontPage Editor: WYSIWYG authoring at its best!

Also, as you can see from Figure 1.2, the FrontPage Editor fully supports image formats and displays images as you will see them in your published Web page. In fact, you can import any of more than a dozen common image formats into your Web page, and based on your selection, FrontPage will automatically convert these image formats to either the Graphical Interchange Format (GIF) or Joint Picture Experts Group (JPEG) format.


NOTE

With over 99 percent of all images on the Web in either GIF or JPEG format, you definitely want your images to be in one of these formats. However, if you don't want to convert the image format, FrontPage will still let you use the image.



Page Wizards for Tables, Forms, Frames, and More


As the first authoring tool created for nonprogrammers yet robust enough for professional Web developers, FrontPage provides everything you need to design, publish, and manage your Internet or intranet Web site. Responding to the need for powerful, easy-to-use Web creation tools, FrontPage provides Web authoring and editing features that help you create rich, dynamic Web sites. One of the most advanced features is the Page Wizard.

Page Wizards help you automatically generate content for your Web page. If you want to create Netscape frame-enhanced documents, follow the frame wizard's step-by-step advice. If you want to create tables using the advanced layout features of HTML 3.2, use the table wizard and you'll be able to create an advanced table in minutes. If you want to create fill-out forms that use a search engine, use the form wizard and then drop a WebBot into your page, and you're ready to go.

At the touch of a button, WebBots allow you to add advanced capabilities to your Web site including: interactive forms, navigation bars, text searches, and discussion forums. WebBots offer drop-in interactive functionality, which greatly streamlines the development process and eliminates the need to write your own scripts or add complicated HTML commands. No programming is involved at all.

Figure 1.3 shows the Frames Wizard setup page. This page allows you to set the number of columns and rows in the frame-enhanced document. Because you can manipulate the size of any frame using the mouse, there is no more guess work in determining the frame size.

Figure 1.3. The Frames Wizard: frame-enhancing your documents made easy.

The Personal Web Server and Administration Tools


Thousands of Web publishers face the complex task of keeping their sites up-to-date. With addresses to Web sites and pages changing every day, it is a frustrating task simply to keep sites current. FrontPage introduces integrated tools that virtually do the job for you. For example, using a feature called backlink, you can tell FrontPage to verify every link in your entire Web site. FrontPage examines onsite and offsite links in the background while you go on to other tasks. When it finishes, you will see a complete report of all invalid and questionable links. Using a single interface, you can then update these links automatically for specific pages or for all pages in your Web site. This means an end to chasing down broken links.

FrontPage also includes a Web server called the Personal Web Server that fully supports the Hypertext Transfer Protocol (HTTP) and the Common Gateway Interface (CGI) standards. If you don't have a Web server, you can use the Personal Web Server to serve one Web site or a dozen Web sites to the Internet community. If you already have a Web site, you can use Microsoft Server Extensions and integrate your Web site with FrontPage. The extensions do much more than simple integration. They let you effortlessly copy or post your Web site between platforms and to other servers while still allowing you to use all FrontPage features.

FrontPage also includes a server administration tool that manages security and access to your server. The FrontPage Internet Services Administrator is a web-based application (See Figure 1.4).

Figure 1.4. The FrontPage Internet Services Administrator: point-and-click Web site administration.

Using this tool, you can control access to your Web site by restricting access to individual users or groups. You can also specify password encryption methods for your Web site. To set and modify user permissions, you will use a simple point-and-click interface featuring hypertext linking and intrinsic HTML controls like buttons and text input fields.

Overview of Web Publishing's Past


The World Wide Web is an open-ended information system designed specifically with ease of use and of document interchange in mind. In early 1989, Tim Berners-Lee of the European Laboratory for Particle Physics (CERN) proposed the Web as a way for scientists around the world to collaborate using a global information system based on hypertext. Work on the World Wide Web project proceeded slowly but amazingly, and near the end of 1990, the pieces started to fall into place.

In the fall of 1990, the first text-only browsers were implemented and CERN scientists could access hypertext files and other information at CERN. However, the structure of hypertext documents and the way they would be transferred to remote sites still had to be further defined. Based on proposals by Berners-Lee, the structure of hypertext documents was defined by a new language called the HyperText Markup Language (HTML). HTML was based on a subset of the Standard Generalized Markup Language (SGML) that was already in wide use at the time. To transfer HTML documents to remote sites, a new protocol was devised. This protocol is called HTTP.

HTTP offers a means of moving from document to document and indexing within documents. The power of hypertext is in its simplicity and transparency. Users can navigate through a global network of resources at the touch of a button. Hypertext documents are linked together through keywords or specified hot areas within the document. These hot areas could be graphical icons or even parts of indexed maps. When a new word or idea is introduced, hypertext makes it possible to jump to another document containing complete information on the new topic. Readers see links as highlighted keywords or images displayed graphically. They can access additional documents or resources by selecting the highlighted keywords or images.

In the fall of 1991, conference-goers around the world started hearing about the promise and ease of hypertext. A few people started talking about hypertext and its potential, but sparks still weren't flying. In early 1993 there were only about 50 Web sites worldwide. Then a wonderful thing happened. A browser enabling users to exploit the graphical capabilities of the Web was developed at the National Center for Supercomputing Applications (NCSA). NCSA called the browser Mosaic. For a time, it seemed the Web and Mosaic were synonymous. Interest in the Web began to grow, at first a trickle of interest, then a great flood of enthusiasm. Looking back, it seems the Web sprang to life overnight. Today, the Web is the hottest and fastest growing area of the Internet, and Mosaic is only one of the dozens of available browsers.

While you undoubtedly have used a browser before, you might not have thought about the processes that make a browser work the way it does. The purpose of a browser is to request and display information. Another term for a browser is a client. Clients make requests to servers. Servers process requests made by clients based on a set of rules for communicating on the network called a protocol. Protocols specify how the programs talk to each other and what meaning to give to the data they receive. Many protocols are in use on the Internet, and the Web makes use of them all. However, the primary protocol in use on the Web is HTTP.

Generally, HTTP processes are transparent to users. To initiate a request for information from a server, all the user has to do is activate a hypertext reference. The user's browser takes care of interpreting the hypertext transfer commands and communicating requests. The mechanism on the receiving end, which is processing the requests, is a program called the Hypertext Transfer Protocol Daemon (HTTPD). A daemon is a UNIX term for a program that processes requests. If you have used a UNIX system, you have probably unknowingly sent requests to the Line-Printer Daemon (LPD) to print material to a printer using the commands lp or lpr. The HTTPD resides on the Web server, which is at the heart of your connection to the Web.

Using the hypertext facilities of the Web, you have the freedom to provide information to readers in powerfully innovative ways. The entrepreneurs who fostered the growth of the Web started by creating small publications that used few of the Web's graphical and multimedia capabilities. This changed dramatically in a few short years, and today's Web publications use many of the graphical, interactive, and multimedia features of the Web. New ways to publish on the Web are constantly being defined, and the features that tomorrow's publications will have may amaze you.

A recent development in HTML publishing is the specification for HTML 3.2. HTML 3.2 is a subset of the original HTML 3.0 specification and is based on features and extensions used in Web documents before May 1996. The first draft of the HTML 3.2 specification was released in May 1996. Because the developers of FrontPage had the foresight to support many extensions to HTML, FrontPage directly supports all HTML 3.2 elements.

Yet, the Web is not defined by HTML alone. Many Web publishers are going back to the standard language that HTML is based upon. SGML is an advanced markup language that, although complex, offers better control over the layout of documents than HTML. SGML is also the basis for many page definition languages used by publishing production systems such as Adobe Acrobat and Common Ground.

While some Web publishers are looking at the origins of Web publishing, others are taking giant leaps forward. These giant leaps forward are possible in part due to innovators such as Netscape Communications Corporation, Microsoft Corporation, and Sun Microsystems, Inc. In the fall of 1994, Netscape Communications Corporation released the first browser to support unique extensions to HTML. The Netscape Navigator took the Internet community by storm and quickly became the most popular browser on the Net. (Netscape's Web site is featured in Figure 1.5.) FrontPage fully supports Netscape Navigator extensions and plug ins. Hot HTML features for Netscape are Netscape highlighted throughout this book.

Figure 1.5. The Netscape Navigator: a hot Web browser.

The browser that might replace top-dog Netscape Navigator is Microsoft's Internet Explorer. Microsoft's Web site is featured in Figure 1.6, and as you can imagine, the site showcases the Internet Explorer browser and FrontPage. Internet Explorer features extensions that enable Web publishers to add soundtracks and live video segments to their Web publications. When a reader accesses a publication with a soundtrack or a live video segment, the sound or video plays automatically if the reader's browser supports these extensions. Internet Explorer also support ActiveX—the key to activating the Internet. FrontPage directly supports Internet Explorer extensions and ActiveX.

Figure 1.6. Microsoft's Web site featuring the Internet Explorer.

Sun Microsystems, Inc. has been a leading supporter of Web innovation. Recently, Sun Microsystems released the HotJava browser, which is written entirely in the Java programming language developed by Sun. The Java language is similar to C and C++, but is unique in that it is platform-independent. Using Java, you can add programs called applets to your Web publications. Applets are self-running applications that readers of your Web publications can preview and play automatically. Sun has set up several Web servers to handle requests related to Java. One of those servers is featured in Figure 1.7. FrontPage allows you to use documents with Java applets. (See Part IX, "JavaScript and Java," for a discussion on Java.)

Figure 1.7. Sun's Web site featuring Java.

Innovations by Netscape, Sun, and Microsoft represent only a small portion of the changes that are revolutionizing the way information is provided to millions of people around the world. These innovations, coupled with the explosive growth and enthusiasm in the Web, make now a more exciting time than ever to be a Web publisher.

As a Web publisher, you can publish information that will be seen by people in dozens of countries around the world, but the best news is that you as an individual can compete solely on the merits of your ideas, products, and services—not the size of your bank account. In Web publishing, you can reach the same audience whether your Web site is based on a $25 basic account from a service provider or a corporate Web server with leased lines costing $1,500 a month. Web users will judge your publications based on their information content and entertainment value.

Internet Standards and Specifications


Many standards are in place on the Web to enable information to be transferred the way it is. Many of these standards relate to specifications for protocols that predate the Web, such as File Transfer Protocol (FTP) and Gopher. FTP provides a way to access files on remote systems. Using FTP, you can log onto a FTP server, search for a file within a directory structure, and download the file. FTP also enables you to upload files to the FTP server. Searching the file structures on FTP servers is a time-consuming process, especially if you do not know the directory of the file you are looking for. The basic functions of FTP have been extended in various ways. The most popular extension is Archie. Using Archie, you can search file archives easily using keywords.

The Gopher protocol is similar to HTTP, but not as powerful or versatile. Using Gopher, you can search and retrieve information that is presented as a series of menus. Menu items are linked to the files containing the actual text. Gopher is most useful as the basis protocol for its more powerful and recent extensions, including Gopher Jewels, Jughead, and Veronica. Gopher Jewels enables you to search catalogs of Gopher resources indexed by category. Jughead lets you search Gopher indexes according to specified information. Veronica enables you to search Gopher menus by keyword.

The major shortcoming of early Internet protocols was the inability to access information through a common interface. Generally, files available via one interface were not available through another interface. To access information on a FTP server, you used FTP. To access information on a Gopher server, you used Gopher. For files that weren't available through either FTP or Gopher, you could try to initiate a remote login to a host using telnet. Sometimes you went from host to host looking for the information you needed.

Even with this simplified scenario, you can probably imagine how time-consuming and frustrating it was to track down the information you needed. Consequently, a major design issue for the Web was how to provide a common easy-to-use interface to access information on the Internet. To ensure that information available through previous protocols is accessible on the Web as well, the Web was built upon existing standards and specifications like those related to FTP and Gopher. You will find that using these other protocols in your Web documents is easy. You simply specify the protocol in a reference to a uniform resource locator (URL). URLs provide a uniform way to access and retrieve files. Without a single way to retrieve files, Internet publishers and users would still be pulling their hair out.

While the specification for URLs is an extremely important specification for finding files on the Web, many other specifications play a major role in defining the Web. Specifications for the hypertext transfer protocol define how hypertext documents are transferred. Specifications for markup languages define the structure of Web documents. Specifications for multipurpose Internet mail extensions define the type of data being transferred and enable you to transfer any type of data on the Web. Finally, specifications for the Common Gateway Interface (CGI) make it possible for you to create dynamic documents. The following sections look briefly at each of these specifications with emphasis on how they affect you as the Web publisher.

Transferring Files Using HTTP


HTTP is the primary protocol used to distribute information on the Web. It is a powerful and fast protocol that allows for easy exchange of files. It is evolving along with other Web technologies. The original specification for HTTP is HTTP/0.9. HTTP Version 0.9 has many shortcomings. Two major shortcomings are that HTTP/0.9 does not allow for content typing and does not have provisions for providing meta-information in requests and responses.

Content typing enables the computer receiving the data to identify the type of data being transferred. The computer can then use this information to display or process the data. Meta-information is supplemental data, such as environment variables that identify the client's computer. Being able to provide information about the type of data transferred as well as supplemental information about the data is extremely important.

To address the shortcomings of HTTP/0.9, the current version of HTTP, HTTP/1.0, allows for headers with a Content-Type field and other types of meta-information. The type of data being transferred is defined in the Content-Type field. You can also use meta-information to provide additional information about the data, such as the language, encoding of the data, and state information. The Personal Web Server included with FrontPage fully supports HTTP/1.0. (See Chapter 8, "Creating Web Documents with FrontPage," for a preliminary discussion on using meta-information in HTML documents.)

An issue that most Web users and publishers want HTTP to address is security. Web publishers and users want to be able to conduct secure transactions. The key issue in security that needs to be addressed to promote the widespread use of electronic commerce is the ability to authenticate and encrypt transactions. Currently, there are several proposals for secure versions of HTTP. The two most popular secure protocols are Secure HTTP (S-HTTP) and Secure Socket Layer (SSL). When one of these specifications is embraced, secure transaction using HTTP will become a reality for mainstream Web users.

HTTP is a powerful protocol because it is fast and light, yet extremely versatile. To achieve this speed, versatility, and robustness, HTTP is defined as a connectionless and stateless protocol. This means that generally the client and server do not maintain a connection or state information related to the connection.

Connectionless Versus Connection-Oriented Protocols

HTTP is a connectionless protocol. Connectionless protocols differ from connection-oriented protocols in the way requests and responses to requests are handled. With a connectionless protocol, clients connect to the server, make a request, get a response, and then disconnect. With a connection-oriented protocol, clients connect to the server, make a request, get a response, and then maintain the connection to service future requests.

An example of a connection-oriented protocol is FTP. When you connect to an FTP server, the connection remains open after you download a file. The maintenance of this connection requires system resources. A server with too many open connections quickly gets bogged down. Consequently, many FTP servers are configured to allow only 250 open connections at one time, that is only 250 users can access the FTP server at once. Additionally, processes that are not disconnected cleanly can cause problems on the server. The worst of these processes run out of control, use system resources, and eventually crash the server. The best of these processes simply eat up system resources.

In contrast, HTTP is a connectionless protocol. When clients connect to the server, they make a request, get a response, and then disconnect. Because a connection is not maintained, no system resources are used after the transaction is completed. Consequently, HTTP servers are only limited by active connections and can generally service thousands of transactions with low system overhead. The drawback to connectionless protocols is that when the same client requests additional data, the connection must be reestablished. To Web users, this means a delay whenever additional data is requested.

Stateless Versus Stateful Protocols

HTTP is a stateless protocol. Stateless protocols differ from stateful protocols in the way information about requests is maintained. With a stateless protocol, no information about a transaction is maintained after a transaction has been processed. With a stateful protocol, state information is maintained after a transaction has been processed.

Servers using stateful protocols maintain information about transactions and processes, such as the status of the connection, the processes running, the status of the processes running, and so on. Generally, this state information is resident in memory and uses up system resources. When a client breaks a connection with a server running a stateful protocol, the state information has to be cleaned up and is often logged as well.

Stateless protocols are light. Servers using stateless protocols maintain no information about completed transactions and processes. When a client breaks a connection with a server running a stateless protocol, no data has to be cleaned up or logged. By not tracking state information, there is less overhead on the server and the server can generally handle transactions swiftly. The drawback for Web publishers is that if you need to maintain state information for your Web documents, you must include this as meta-information in the document header.

Determining the Structure of Web Documents


The way you can structure documents is largely determined by the language you use to layout the document. Some languages are advanced and offer you rich control over document layout. Other languages are basic and offer ease of use and friendliness instead of advanced features. The following sections take a look at commonly used languages, including


SGML

Most Web documents are structured using a markup language that is based on SGML. SGML defines a way to share complex documents using a generalized markup that is described in terms of standard text. Describing complex structures in terms of plain text ensures the widest distribution to any type of computer and presents the formatting in a human-readable form called markup. Because the markup contains standard characters, this also means anyone can create documents in a markup language without needing special software.

SGML is an advanced language with few limitations. In SGML, you have full control over the positioning of text and images. This means text and images will be displayed by the user's SGML browser in the precise location you designate. Although SGML is a powerful markup language, it is not widely used on the Web. However, this is changing as more publishers become aware of the versatility of SGML.

VRML

Technology on the Web is growing at an explosive pace, and one of the most recent developments is VRML. VRML enables you to render complex models and multidimensional documents using a standardized markup language. The implications of virtual reality for Web publishers are far reaching.

Using VRML, you can reduce calculations and data points that would have filled 10MB of disk space to a few hundred lines of markup code. Not only does this drastically reduce the download time for VRML files and save network bandwidth, it also presents complex models in a readable and—gasp—understandable format. While VRML is not yet widely used on the Web, it is attracting tremendous interest within the Internet community and the world community as well. Although the current version of VRML is VRML 1.0, the Moving Worlds specification for VRML 2.0 has recently been approved and is gaining widespread support.

HTML

HTML is the most commonly used markup language. HTML's popularity stems in large part from its ease of use and friendliness. With HTML, you can quickly and easily create Web documents and make them available to a wide audience. HTML enables you to control many of the layout aspects for text and images. You can specify the relative size of headings and text as well as text styles, including bold, underline, and italics. There are extensions to HTML that enable you to specify font type, but standard HTML specifications do not give you that capability.

Although many advanced layout controls for documents are not available using HTML, it is still the publishing language of choice on the vast majority of Web sites. Remember, the limitations are a way to drastically reduce the complexity of HTML. Currently, three specifications are associated with HTML: HTML 1.0, HTML 2.0, and HTML 3.2. Each level of the specification steadily introduces more versatility and functionality.

In addition to these specifications, several Internet developers have created extensions to HTML. While the extensions are nonstandard, many have been embraced by Web publishers. Some extensions, such as Netscape's and Microsoft's, are so popular that they seem to be standard HTML.

Page Definition Languages

Some Web documents are formatted using page definition languages instead of markup languages. Page definition languages often use formats that are specific to a particular commercial page layout application, such as Adobe Acrobat or Common Ground. Page layout applications are popular because they combine rich control over document layout with user-friendly graphical interfaces. While the formats these applications use are proprietary, most of the formats are based on the standards set forth by SGML.

Identifying Data Types with MIME


Using HTTP, you can transfer full-motion video sequences, stereo sound tracks, high-resolution images, and any other type of media you can think of. The standard that makes this possible is Multipurpose Internet mail extensions (MIME). HTTP utilizes MIME to identify the type of object being transferred across the Internet. Object types are identified in a header field that comes before the actual data for the object. Under HTTP, this header field is the Content-Type header field. By identifying the type of object in a header field, the client receiving the object can appropriately handle it.

For example, if the object is a GIF image, the image will be identified by the MIME type image/gif. When the client receiving the object of type image/gif can handle the object type directly, it will display the object. When the client receiving the object of type image/gif cannot handle the object directly, it will check a configuration table to see if an application is configured to handle an object of this MIME type. If an application is configured for use with the client and is available, the client will call the application. The application called will then handle the object. Here, the application would display the GIF image.

Not only is MIME typing extremely useful to HTTP, it is useful to other protocols as well. MIME typing was originally developed to allow e-mail messages to have multiple parts with different types of data in each part. In this way, you can attach any type of file to an e-mail message. The MIME standard is described in detail in Requests for Comments (RFCs) 1521 and 1522. (See Chapter 33, "Writing CGI Scripts," for a complete listing of MIME types and their uses in your Web documents.)


NOTE

Many Internet standards and specifications are described in RFCs, which are a collection of documents pertaining to the Internet that cover everything from technical to nontechnical issues.



Accessing and Retrieving Files Using URLs


To retrieve a file from a server, a client must know three things: the address of the server, where on the server the file is located, and which protocol to use to access and retrieve the file. This information is specified as a URL. URLs can be used to find and retrieve files on the Internet using any valid protocol.

Although you normally use HTTP to transfer your Web documents, you can include references to other protocols in your documents. For example, you can specify the address to a file available via FTP simply by naming the protocol in a URL. Most URLs you will use in your documents look something like this:




protocol://server_host:port/path_to_resource

The first part of the URL scheme names the protocol the client will use to access and transfer the file. The protocol name is generally followed by a colon and two forward slashes. The second part of the URL indicates the address of the server and terminates with a single slash. The server host can be followed by a colon and a port address. The third part of the URL indicates where on the server the resource is located and can include a path structure. In a URL, double slash marks indicate that the protocol utilizes the format defined by the Common Internet Scheme Syntax (CISS). Colons are separators. In this example, a colon separates the protocol from the rest of the URL scheme; the second colon separates the host address from the port number.


NOTE

CISS is a common syntax for URL schemes that involve the direct use of IP-based protocols. IP-based protocols specify a particular host on the Internet by a unique numeric identifier called an IP address or by a unique name that can be resolved to the IP address. Non-CISS URL schemes do not name a particular host computer. Therefore, the host is implied to be the computer providing services for the client.


Here's a URL using HTTP to retrieve a file called index.html on the Macmillan Computer Publishing Web server:







URLs, which are defined in Request for Comment (RFC) 1738, are powerful because they provide a uniform way to retrieve multiple types of data. The most common protocols you can specify using URLs are

FTP File Transfer Protocol
Gopher Gopher Protocol
HTTP Hypertext Transfer Protocol
mailto Electronic mail address
Prospero Prospero Directory Service
news Usenet news
NNTP Usenet news accessed with Network News Transfer Protocol
telnet Remote login sessions
WAIS Wide Area Information Servers
file Files on local host

Using these protocols in your Web documents is explored in Chapter 9, "Creating Web Documents with FrontPage."

Creating Dynamic Documents with CGI


The popularity of the Web stems in large part from interactivity. Web users click on hypertext links to access Web documents, images, and multimedia files. Yet the URLs in your hypertext links can lead to much more than static resources. URLs can also specify programs that process user input and return information to the user's browser. By specifying programs on the Web server, you can make your Web publications highly interactive and extremely dynamic. You can create customized documents on demand based on the user's input and on the type of browser being used.

Programs specified in URLs are called gateway scripts. The term gateway script comes from UNIX environments. Gateways are programs or devices that provide an interface. Here, the gateway or interface is between your browser and the server. Programs written in UNIX shells are called scripts by UNIX programmers. This is because UNIX shells, such as Bourne, Korn, and C-shell, aren't actual programming languages. Because UNIX shells are easy to use and learn, most gateway scripts are written in UNIX shells.

The specification CGI describes how gateway scripts pass information to servers. CGI provides the basis for creating dynamic documents, which can include interactive forms, graphical menus called image maps, and much more. The power of CGI is that it provides Web publishers with a common interface to programs on Web servers. Using this common interface, Web publishers can provide dynamic documents to Web users without regard to the type of system the publisher and user are using.


NOTE

FrontPage features direct support for CGI and allows you to drop CGI-based WebBots directly into your publications with no programming involved. Nearly a dozen WebBots perform advanced functions on most Web publishers wish lists. However, if there isn't a WebBot for a specific task you want to perform, you can still write your own CGI script that will interact with documents you created using FrontPage. (See Chapter 33, "Writing CGI Scripts," for more information on writing scripts and using FrontPage with scripts.)



The Evolution of Standards and Specifications


The standards and specifications you read about in the previous section are the result of coordinated efforts by standards organizations and the working groups associated with these organizations. Generally, these organizations approve changes to existing standards and specifications and develop new standards and specifications. Three primary standards groups develop standards and specifications that pertain to the Internet and to networked computing in general. These groups are


The International Organization for Standardization


The International Organization for Standardization is one of the most important standards-making bodies in the world. The ISO doesn't generally develop standards specifically for the Internet; rather, the organization develops standards for networked computing in general. One of the most important developments by the organization is the internationally recognized seven-layer network model. The seven-layer model is commonly referred to as the Open Systems Interconnection (OSI) Reference Model.

Most Internet specifications and protocols incorporate standards developed by the ISO. For example, ISO standard 8859 is used by all Web browsers to define the standard character set. ISO 8859-1 defines the standard character set called ISO-Latin-1. The ISO-Latin-1 character set has been added to and the addition is called the ISO-Added-Latin-1 character set. You will refer to these character sets whenever you want to add special characters—such as &, ©, or ®—to your Web documents.

The Internet Engineering Task Force


The Internet Engineering Task Force (IETF) is the primary organization developing Internet standards. All changes to existing Internet standards and proposals for new standards are approved by the IETF. The IETF meets three times a year to set directions for the Internet.

Changes to existing specifications and proposals for new ones are approved by formal committees that meet to discuss and propose changes. These formal committees are called working groups. The IETF has dozens of working groups. Each group generally focuses on a specific topic within an area of development. Some areas of development include


NOTE

The process for approving and making changes to specifications within the working groups is standardized. The working groups propose Internet Draft specifications. The specifications for HTML and HTTP are currently draft specifications. Internet Drafts are valid for six months after they are formalized. If the Internet Draft has not been approved in six months, the draft expires and is no longer valid. If the Internet Draft is approved, it becomes an RFC.


RFCs are permanently archived and are valid until they are superseded by a later RFC. As their name implies, RFCs are made available to the general Internet community for discussion and suggestions for improvements.

Many RFCs eventually become Internet Standards, but the process isn't a swift one. For example, URLs were introduced by the World Wide Web global information initiative in 1990. While URLs have been in use ever since, the URL specification did not become an RFC until December 1994 and was only recently approved as an Internet standard.

Figure 1.8 shows IETF's site on the Web. At the IETF, you can find information on current IETF initiatives, which include the latest standards and specifications pertaining to the Internet.

Figure 1.8. The Internet Engineering Task Force Web site.

Membership in the IETF is open to anyone. The directors of the working group areas handle the internal management of the IETF. These directors, along with the chairperson of the IETF, form the Internet Engineering Steering Group (IESG). The IESG, under the direction of the Internet Society, handles the operational management of the IETF.

You can find more information on the Internet Society and membership in the Internet Society at the Web site:

http://www.isoc.org/

The World Wide Web Consortium


The World Wide Web Consortium (W3C) is managed by the Laboratory for Computer Science at the Massachusetts Institute of Technology. The W3C exists to develop common standards for the evolution of the World Wide Web. It is a joint initiative between MIT, CERN, and INRIA. The U.S. W3C center is based at and run by MIT. The European W3C center is at the French National Institute for Research in Computing and Automation (INRIA). CERN and INRIA cooperate to manage the European W3C center.

The W3C was formed in part to help develop common standards for the development of Web technologies. One of the W3C's major goals is to provide Web developers and users with a repository of information concerning the Web. Toward that end, the W3C has sites available where you can find the most current information related to Web development. At the W3C Web site shown in Figure 1.9, you can find the most recent drafts of specifications, including those for HTML 3.2 and HTTP 1.0.

Figure 1.9. The World Wide Web Consortium Web site.

Another goal of the W3C is to provide prototype applications that use new technologies proposed in Internet Drafts. The W3C works with its member organizations to propose specifications and standards to the IETF. Member organizations pay a fee based on their membership status. Full members pay $50,000 and affiliate members pay $5,000 for a one-year membership.

Evaluating Your Access Needs


Before you start publishing on the Web, you must evaluate your access needs so you can determine what type of account will meet your needs as a Web publisher and can obtain the level of access to the Web that is right for you. If you plan to provide Internet-related services or products specifically for Internet-smart consumers, you will want your own domain. A domain address is a unique address that only you will have. Web users can use programs, such as Whois, to obtain information about your domain.


NOTE

Whois is a basic protocol to find information on Internet users and domains. If you have an account with an Internet Service Provider (ISP) with access to a UNIX shell, you can type whois at the shell prompt. For example, to find more information on my domain, tvp.com, you would type the following at the shell prompt:




whois tvp.com


Having your own domain plays a key role in establishing a presence on the Web. Many users make specific judgments about you based on the address URL to your Web pages. Most people believe that you must set up a Web server to obtain your own domain. This is not true. Web publishers who want their own domain have several options to make this possible.

Most people do not need to set up their own Web server. If you plan to go through an ISP to obtain an account with Web publishing privileges, you do not need to set up your own Web server. You will use your ISP's Web server to publish your Web documents. If you already have an account with an ISP, you might already have all you need to publish on the Web.

Your access options are the following:


Installing Your Own Web Server


Installing your own Web server is the most expensive option for Web publishing, yet with this expense comes significant advantages. With a dedicated connection, you can provide 24-hour Web services to users worldwide. You will have complete control over your Web server and can publish whatever you wish. You can configure the server to handle other services as well, such as FTP, Gopher, telnet and CGI scripts. You will also have your own domain, which will establish a clear presence on the Web for you or your company. Your URL will look something like the following:




http://www.your_company.com/

Server Software and Platform Options

FrontPage includes a Web server software package called the Personal Web Server. This server runs on Windows NT and Windows 95 platforms, yet has extensions for the most popular Windows 95, Windows NT and UNIX-based servers. This means you can create and manage documents on a Windows 95 or Windows NT computer, and publish your documents on a Windows 95, Windows NT, or UNIX platform.


NOTE

A Macintosh version of FrontPage is in development. This addition will mean that you can use FrontPage to create, manage and publish your documents on virtually any operating system.


You might be amazed at how easy it is to install server software, especially if you plan to use the Personal Web Server. Like most commercial server software, the Personal Web Server is nearly trouble-free and includes an automatic installation process. If you don't have a Web server, you will probably want to use the Personal Web Server, which can be installed in about 5 minutes.

For an individual or small company wanting to set up a Web server, the best server software to use is most likely the software that will run on the computer system you are most familiar with. For a company with an installed computer network, you might want to use one of the computers already available as your Web server. Before you install the Web server, you will want to carefully consider security options such as a firewall to shield your internal network from illegal activities.

If you do not have a computer capable of providing Web services, you will need to purchase one at a cost of $3,000 to $25,000 or lease one at a cost of $75 to $500 per month. Before buying or leasing a computer, you must determine what platform the Web server runs on. Again, the best server software for you is most likely the software that runs on a platform familiar to you or your support staff. However, before you make any decision, examine carefully how and for what purpose the company plans to use the Web server.

Commercial options are usually the easiest to install, support, and maintain; however, the primary reason for using commercial Web server software is support. If you believe you will need outside software support to keep the server alive, commercial software is the best choice.

Most shareware servers run on UNIX systems. UNIX servers are typically the best maintained and supported. As a result, UNIX servers are some of the most reliable and secure servers available. If you have a strong need for reliability and security, you should look at UNIX Web server software. However, you might need an experienced team to compile the source code and to configure the server parameters.

Internet Connection Options

You will also need to obtain an Internet connection. Generally, you will obtain an Internet connection for a fee from an Internet Service Provider or a commercial online service. The speed of the connection will drive the monthly fees. To determine the best connection speed for you, you will need to estimate the volume of traffic for the site. A good way to estimate traffic is to visit a site similar in content and structure to your intended site. As most popular sites provide some historical information on the usage of the site, you can use the data to make a better estimate of traffic for your site.

Although the Internet is a global structure, usage of your site probably will not be at a steady constant pace throughout the day. For example, a site with 25,000 hits a day might experience peak usage periods within fluctuating time-windows. These peak periods present a problem when evaluating your Internet connection needs. For this reason, for a Web site with an anticipated high volume of traffic, such as daily network traffic of more than 25,000 hits per day, you might want to consider using a high-speed T1 connection to the Internet. Leasing a T1 line will cost you $1,500 to $5,000 per month, plus an installation fee of $1500 to $5,000.

Most Web sites do not need a T1 connection to the Internet. In fact, the average site needs only a 56 Kbps line. A 56 Kbps connection can adequately handle daily network traffic of 2,000 hits per day, and the really good news is that the cost of a 56 Kbps connection to the Internet is only $300 to $500 per month, plus a startup fee of up to $500.

Using an Internet Service Provider's Web Server with a Standard Account


Obtaining an Internet account with Web publishing privileges is an inexpensive option. Typical costs for such an account are $20 to $50 per month, plus a start-up fee of up to $50. The account should include at least 2 to 3MB of storage space on the service provider's computer. Most ISPs offer unlimited access time to the Internet, meaning whether you log on for 40 or 400 hours a month, you will generally pay the same monthly fee. While your e-mail, standard files, and Web-published files will use this space, 2 to 3MB is usually adequate to maintain a modest-sized site. If you currently have an account with an ISP that allows Serial Line Internet Protocol (SLIP) or Point-to-Point Protocol (PPP) access to the Web, you might already have Web publishing privileges!

Your account with an ISP is available on a dial-up basis. A dial-up connection requires a computer, which may or may not be dedicated to networking, with communications software and a modem. The good news about a dial-up connection is that it utilizes a regular phone line with speeds ranging from 9.6 Kbps to 28.8 Kbps. Your computer is used to establish a connection over the modem and phone line for a temporary period, and at the end of use, the connection to the Internet is broken. You will use the connection to browse the Web, navigate around the Net, or to check on your site published on the ISP's Web server.

Before you set up an account, check with your ISP for specifics on storage space, additional fees for storage space that should not be more than $2 per megabyte, and possible additional fees if you have a popular site. You will also want to check on the availability of additional services such as FTP, Gopher, telnet, and CGI scripts, which should be available for use free if they are available at all.

While an account with an ISP is an inexpensive option, it is also a very basic one. You do not have control over the Web server. You will be at the mercy of the ISP for additional services, including CGI scripts. You will not have your own domain, and people will know this immediately because your URL will look something like this:




http://www.your_service_provider.com/~you

Using a Commercial Online Service's Web Server with a Standard Account


America Online, CompuServe, Genie, and Prodigy all offer or plan to offer Web publishing privileges to their customers. Publishing on the Web through a commercial online service is your least expensive alternative if you use your account wisely. Typical costs for such an account are $10 to $20 per month, plus a small additional fee for maintaining your Web pages on the online service's Web server. Most commercial online services provide only a few hours of connection time free each month. After you use your free connection time, you will have to pay additional connection charges. If you currently have an account with a commercial online service, you might already be able to publish on the Web!

Your account with a commercial online service is available on a dial-up basis. You will use the connection to browse the Web, navigate around the Net, or to check on your site published on the online service's Web server. Before you set up an account, check with the commercial online service for specifics on storage space and possible additional fees if you have a popular site.

While an account with a commercial online service is the least expensive option, it is also the most basic option. Many online services are fairly new to Web publishing themselves and do not offer the access to essential additional services. While this, of course, will change in time and probably quickly, you should ask your online service about additional services, such as FTP, Gopher, and CGI, to find out when they will be available. You will not have your own domain, and people will know this immediately because your URL will look something like this:




http://www.commercial_online_service.com/~you

TIP

If you are interested in Web publishing with a Commercial Online Service, visit these Web sites where you will find current rates and publishing options:

America Online http://www.aol.com/
CompuServe http://www.compuserve.com/
Genie http://www.genie.com/
Prodigy http://www.prodigy.com/



Getting a Phantom Domain


Getting a phantom domain is often the best option available for anyone wanting to Web publish. With a phantom domain, you get most of the benefits of having your own Web server and affordability. When you have your own domain, Web users can use programs, such as Whois, to learn additional information about you.

Typical costs for a phantom domain are only slightly more than a basic account with an ISP and range from $25 to $75. The primary advantage of a phantom domain is that you will have your own domain and your URL will look something like this:




http://www.your_company.com/

The preceding address URL is easier to type and remember than an address URL containing the tilde. Instead of telling people your URL is www.yourserviceprovider.com/~yourcompany, you can tell them your URL is www.yourcompany.com. You might be surprised to learn that many users try to find sites based on the company name. For example, when I look for a site associated with a major company, I usually type http://www.companyname.com in my browser's URL window. If the URL is valid, I am at the company's Web site without having to look up the URL in a Web database that may or may not have the site's URL.

Some ISPs call this service Web server hosting. This generally means by hosting, the ISP is creating a phantom domain for you on their system. Maintaining a phantom domain is no more taxing on the ISP's server than your standard account and is, in fact, little more than clever linking to make the outside world think you have your own domain. With a phantom domain, you still have no control over the Web server or additional services. However, most ISPs that offer phantom domains include additional services as part of the deal, and these additional services are the only real justification for higher fees.

Phantom Domains


Phantom domains are the wave of the future in Web publishing. If you already have an account with an ISP, check to see if they offer phantom domains. Many ISPs provide phantom domains to their users because it is an easy way to generate extra revenues.

You can obtain a phantom domain from an ISP, a commercial service provider, or an Internet presence provider. Internet presence providers specialize in setting up Web sites. Most of the sites that presence providers set up are phantom domains. A typical presence provider will service hundreds of phantom domains off one or more Web servers. While servicing hundreds of businesses off one server might sound like a lot, the power and capacity of the server and the speed of its connection to the Internet are more important than anything else.

Because Internet presence providers specialize in servicing businesses instead of individual users, business-oriented sites might do better than these providers. Dozens of presence providers are available. For more information on service providers, visit




http://www.isoc.org/~bgreene/nsp1-5c.html

To find a comprehensive list of Internet Service Providers, visit The List. This site maintains one of the best ISP listings:




http://www.thelist.com/

or




http://www.cybertoday.com/ISPs/ISPinfo.html



Summary


The Web was built upon existing protocols and intended to provide a common interface to other protocols. Because of this design, you can use any valid protocol to transfer files. While you will primarily use HTTP to access your Web documents, you can use other protocols, such as Gopher and FTP, to enhance the usability of your documents. The face of Web publishing is changing rapidly and the way you can specify the structure of Web documents is changing just as rapidly. The most common way to structure Web documents is with HTML. You can also use the SGML, VRML, and page layout applications to structure documents you provide on the Web.

The mechanism that enables you to provide access to any type of document on the Web is the MIME standard. Using multipurpose Internet mail extensions, you can provide information about documents in the Content-Type header field. Browsers will use the content type to take appropriate action on the document, such as displaying an image or calling another application. The mechanism that enables you to access and retrieve files on the Web is the URL standard. With URLs, you can locate and retrieve files using the appropriate protocol. The final specification of interest to Web publishers is CGI. Using CGI, you can create dynamic documents.

To stay current with the latest developments on the Web, you should follow the Internet standards and specifications proposed by Internet standards groups, such as the IETF and the W3C. While you should consider all these issues before you start Web publishing, you should also evaluate your own access needs. This is true even if you already have an Internet account. By evaluating your access needs, you can determine what type of account will meet your needs as a Web publisher.

Previous Page Page Top TOC Next Page