Chapter 11
The CGI and Networks
CONTENTS
At this point, knowing what you know about CGI, you can begin
to lay out what is important when deciding to work in a programming
environment. The first consideration is whether the programming
language you use allows you to read it, enhance it, and maintain
it. Match your language to your needs. Some languages are more
powerful, but take more time and RAM to do their processing. Keep
storage concerns in mind. Every language has memory needs. There
has to be sufficient disk space to store the language software
itself, as well as room for the language to create temporary files
"on the fly" that it may need to operate properly.
How fast can clients access each particular feature on your server,
whether it is your Web pages or your FTP archive? When determining
your access response time standard, the five-second rule is a
good rule to follow. If it takes more than five seconds for the
client to load whatever it has requested from your server, you
are taking too long. The one big exception to the five-second
rule is a search engine response-which may take longer.
The aptitude the language has for text manipulation is also important.
Finally, figure out whether the language can "talk"
easily with the various applications that will be accessing the
Web server. The ability to "talk" means that the language
you use should easily mesh with the applications you use on your
server when it uses them. The best way to decide how well a language
will "talk" with your applications is to determine which
applications you will be using with the language-perhaps a database
program-and then examine the potential CGI language based on the
needs of your list of applications.
CGI scripts work best with well-organized data. There should be
an overall method that you use to decide where each piece of data
goes and to guide your file storage. Are you storing data by type
or by category? Or are you storing data by kind of access or ease
of display? This method can then be easily applied to your CGI
scripts, to integrate them into your system's present organizational
structure.
Make sure you know where everything is going before you get going.
Establish a directory for final scripts, then don't clutter up
that directory with test scripts. Create a way to check which
version of your script is current, and where your test scripts
are located, such as keeping test scripts in a directory labeled
"TEST_SCRIPTS." Then rename and move a script to its
proper directory, perhaps a script library folder, once it is
ready for use.
Following accepted programming practices is very important. Document
your code. Use separate directories for production and testing.
Keep up-to-date maintenance logs with each new bug recorded in
sufficient detail to be useful later.
Getting a handle on HTTP is essential for success with your CGI
scripts. Keep up with any changes through any number of USENET
newsgroups. Know the client/server cycle. Whether it is using
CGI or not, the client/server dynamic should be well understood
by anyone running an HTTP service. You should also have an understanding
of how data is passed using <STDIN> and environmental variables,
including the primary methods for doing this-that is, GET and
POST in the HTML form format.
There are several protocols and specifications used by the CGI,
like HTTP, FTP, and e-mail. MIME specifications are very important
when using the CGI with a Web server. There are several languages
that can be used with these CGI specifics, such as C/C++, REXX,
Python, and of course, Perl.
Understanding their role is important in understanding where your
Perl scripts will fit in without difficulty. Another important
consideration is whether protocols or specifications need adapting,
such as with MIME headers, which must be included at the beginning
of any Perl script returning data from your server to a client.
For CGI to run smoothly on your server, it has to know what kind
of data is coming in so it can figure out what to do with it.
Using MIME specifications, your server tells the client what kind
of file it is returning to fulfill its request, so the client
knows what to do with that file. Some of the different file types
handled with MIME specifications are HTML, JPEG, GIF, MOV, and
so forth. A full list of MIME header file types can be found in
Appendix B.
If a client's request comes in, and it contains a METHOD=GET (or
POST) argument, then some kind of data will be written to standard
output, which is then sent by the server to the client. The initial
print statement must be output in a kind of form string. Its format
would resemble
Content-type type/subtype <line feed> <line feed>
and when this appears in Perl, it will look like this:
print "Content-type: text/html \n\n";
where \n is Perl's line feed escape.
Adhering to MIME specifications lets the client know that it is
receiving a text file to be treated like HTML. Note that there
are two line feed escape commands given. The first makes the initial
line feed to go to line 2 of the output. The second produces a
completely blank line. For a CGI script to run successfully, the
second line of its output must be blank.
An example of a simple HTML document sent to satisfy a client's
request might look something like Figure 11.1, where a simple
greeting is sent back to the user. It is accomplished with the
following script:
Figure 11.1 : A simple CGI greeting using Perl.
#!/usr/bin/perl
#kingsley.pl
print "Content-type: text/html \n\n";
print "<HTML> \n";
print "<HEAD><TITLE>Hey Now!</TITLE></HEAD> \n";
print "<BODY>Hey Now!</BODY></HTML> \n";
exit;
Now, a quick look at other commonly used CGI languages is in order.
These are the other CGI languages that you may have heard of-or
even encountered-in written scripts. Among them are C, C++, REXX,
and Python. Some quick definitions will help you to understand
better what is being discussed when you come across references
to these other CGI languages, and even how they compare with Perl.
To do this, a quick review of Perl as a CGI language is helpful
here.
Perl
The first reason for using Perl as a CGI language is due entirely
to its popularity. There are so many people using it that the
number of Perl libraries is huge, and these libraries are growing
every day. Ready-to-use Perl routines are often included in many
Web server packages. Often, new Internet applications include
Perl gateway routines free of charge.
There are also numerous support areas; the most important of these
are the newsgroups, comp.lang.perl.misc and comp.lang.perl.announce,
where Perl creator, Larry Wall, has been know to hold court, as
well as many of the other key players in Perl's development and
evolution, such as Tom Christiansen and Randal Schwartz.
Finally, Perl routines can be viewed as mix and match modules.
Often a Perl script can be used in a new configuration with an
intent that was never meant for the original script. These smaller
scripts can reside quietly in the library waiting to be used for
many other tasks, thus making Perl even more efficient.
Perl behaves as an interpreted language, which means that it interacts
with the operating system in a more complicated way than a compiled
language like C. This creates a longer processing time than a
compiled language. The trade-off for this loss in speed with Perl
is the ease with which its code can be written and understood.
C/C++
C is one of the most widely accepted languages in the scripting
world. Even though it is a very difficult language to learn, it
is very powerful. Its difficulties and power both derive from
the same reason-it is a low-level language. Although this makes
C seem positively archaic in its peculiarities, it works very
close to an operating system when it is parsing and processing
code, making it very fast, even when used to create large applications.
C works directly with the operating system itself, instead of
the processes which often act as an intermediary between a language
and an operating system. This can be seen best when C is run against
interpreted scripts like Korn, Bourne, and Perl, especially when
used in large database programs. The interpreted languages must
operate out of a shell, whereas C does not. This can be very advantageous
to a Web site designer, who does not want to expose his or her
system's operating shell to the outside world.
Another benefit of C is that it has extensive libraries of procedures,
lots of readily available sample code, debuggers, and support
groups. As the primary language for UNIX development, C is also
secure as far as having a future, as well. Learning C is never
a waste of time.
Another factor that makes C attractive is that it adds an extra
level of security to a Web server. Because it is compiled, there
is no need to have the source code on your server. The C language
works by being compiled into machine language. It is this machine
language executable that sits on your server, not the original
C script. Any of the three people who are smart enough to figure
out how to modify machine code are not likely to waste their time
trying; they're too busy trying to invent a new organism, or artificial
intelligence for their toasters.
But, as with most of C's features, there is a "very good"
and "very bad" side to each. The tie to UNIX makes it
a natural for UNIX boxes, and UNIX programmers. Those of us who
are working in different environments, such as Windows NT, have
special concerns that do not mesh easily with the UNIX world,
some of which have already been mentioned in this book. With UNIX,
the operating system works without the additional level of the
graphical interface that Windows NT has, making UNIX closer to
the actual operating system. This is ideal for C, which also works
closely with the operating system. With Windows NT, the graphical
interface level creates a layer of difficulty for C that has to
be overcome.
There is also no file I/O or OS interfacing in a low-level language
such as C. Many of the other features expected from a higher-level
language (such as Perl) are available only in procedural script
libraries with C. Each OS will have its own specifics when it
comes to these libraries.
There is a great debate raging, as there always is, as to whether
or not C or Perl is the superior CGI language. I have a sneaking
suspicion that those people who learned to program with C will
favor C, and those who learned with other languages will favor
those languages. Perl is a language that excels when it deals
with text manipulation. The CGI environment's biggest demand is
the ability to deal with text smoothly, making Perl an obvious
best choice.
C++ is a superset of C, meaning that its functions and operators
are built on C, so the concerns that are raised with C also apply
to C++ as far as the scope of this book is concerned.
REXX
REXX is a language that was developed at IBM in the late 1970s.
Mike Cowlishaw developed REXX as a procedural language.
One problem with REXX is that there is still no formal standard.
Although most REXX scripts are portable, there is no standard
specification that applies to REXX, so there can be irregularities
that need to be addressed when using REXX scripts from another
operating system. It is also not as popular as Perl or C for CGI
programming, so there is much less "out there" for REXX
in the way of support and script libraries. REXX, however, has
been ported to just about every platform, including OS/2, Macintosh,
Amiga, AS/400, and mainframes such as VM and MVS.
This is not to make the claim that Perl is the be all and end
all of CGI programming. The more a Web Master knows, the more
tricks he or she has to solve problems. Complex sites will often
contain some Perl and C scripts. In addition, the CGI languages
mentioned here are only a few of the those currently available.
Because CGI is only a specification, not a language itself, there
are no real restrictions on what can be used for it. For more
information on REXX, read M.F. Cowlishaw's book, The REXX Language,
A Practical Approach to Programming (Prentice-Hall,
1985). This is the closest thing there is to a formal REXX standard.
You can also try these Web sites:
http://www.pvv.unit.no/RexxLA/index.html
http://www2.hursley.ibm.com/rexx/
ftp://rexx.uwaterloo.ca/pub/rexxfaq.txt
Python
Python was developed in Amsterdam, The Netherlands, by Guido van
Rossum for a company called Stichting Mathematisch Centrum. It
is a new language that goes even further in being readable than
Perl.
Python, like Perl, is an interpreted language, so like Perl the
Python interpreter must be installed on your server to use it.
From the start Python was developed as an object-oriented language,
so it is a strong, effective language for programming CGI.
Python's most interesting feature is its huge library of functions.
It can use these to communicate over networks or to access system-specific
functions. There is also a CGI library that is available to programmers.
Some of the functions in the library include parsing, printing
the defined environmental variables (to aid in debugging), and
printing the contents of an HTML form. To attest to Python's usefulness
it was used to create Infoseek's search engine, and all of its
other programs. More information about Python can be found at
http://www.python.org/
comp.lang.python
There is one other area of CGI that warrants attention before
concluding this section. To provide greater usefulness to your
users, you might want to have Server Side Includes as a service
available on your Web server. It is important to note that although
the NSCA and Netscape HTTP servers do support Server Side Includes,
or SSIs, at the time this book goes to press, the CERN HTTP server,
EMWAC HTTP server, and IS do not provide such support. There are,
however, several other HTTP servers available, which also support
SSIs. CERN has announced plans to make SSIs a part of their server
in the future.
Although SSIs are not strictly in the realm of the CGI, they are
included here because they may solve some problems you may have
with some of your Web pages. Remember, you want to have as many
problem-solving strategies as possible available because many
problems require more than one strategy to be solved successfully.
SSIs denote the handling of special extensions to HTML tags. Resembling
HTML documents, SSI files are very similar to HTML files. SSI
files differ from HTML in their use of a superset of the CGI environmental
variables. This does not make SSI files particular to the CGI,
because SSI files do not have to have a gateway to operate.
SSIs do not run automatically. You have to enable them on your
server before they will work. Please check your HTTP server's
documentation for the way in which to enable SSIs.
It is important to know that although SSIs add a lot to a Web
site, they also place a greater demand on your server's resources.
For SSIs to run, the server has to read, or parse, every line
of the SSI file to find the special SSI commands. If you find
your server is becoming overworked, one quick way to deal with
the traffic overload is to disable the SSI capacity on your server.
Be very careful not to identify every HTML file as an SSI extension,
or then your server will parse every HTML file that is accessed.
In turn, this creates a huge drain on your server's resources
and a time lag in satisfying the client's request. Typical SSI
files have .shtml as their file extension.
Comparing SSIs with HTML will give you a clearer picture. Because
SSIs are parsed by the server into HTML before they go to the
browser, an SSI file looks very similar to an HTML file. This
is file jazz.shtml.
<HTML>
<HEAD>
<TITLE>Jazz on the Web</TITLE>
</HEAD>
<BODY>
<H1>Jazz on the Web!</H><BR>
<H2>
This site was last modified on <!--#echo var="LAST_MODIFIED" -->
</H>
<HR>
<A HREF="http://town.hall.org/Archives/radio/Kennedy/Taylor/"><H1>Jazz Styles</A></H>
<A HREF="http://www.yahoo.com/Entertainment/Music/Artists/By_Genre/Jazz/">
<H1>Jazz Musicians</A></H>
<A HREF="http://www.yahoo.com/Entertainment/Music/Genres/Jazz/Labels/">
<H1>Jazz Labels</A></H><BR><BR>
<H2>The jazz quote of the day is:</H><BR>
<!--#include virtual="quotes" file="jazzquotes.html" -->
</BODY>
</HTML>
This file looks like a regular HTML document, except for two lines.
The first line shows a very common use of SSI, which is to keep
a running update on when the page was last modified.
The server reads the line
This site was last modified on <!--#echo var="LAST_MODIFIED" -->
and sees that it has to resolve the variable LAST_MODIFIED, which
it then echoes to the client in HTML. The second line not standard
to HTML is
<!--#include virtual="quotes" file="jazzquotes.html" -->
which the server parses and then sees it has to add the file jazzquotes.html
to the page before it sends the data to the browser. The end result
looks something like Figure 11.2.
Figure 11.2 : An example SSI file parsed in HTML.
It is interesting to note that when the server returns the called
SSI file, jazz.quotes.html, it not only attaches the file to the
new, parsed HTML file, but in the exact place where the SSI line
of code is in the unparsed .shtml file.
If your haven't already noticed, SSI commands are an adaptation
of the HTML comment tags. This was intentional, so that if you
move your HTML documents, which contain SSIs, to another server,
they will look the same, regardless of whether that server supports
SSIs or not. Inside the HTML comment form the SSI syntax looks
like this:
<!--#command cmd_argument="argument_value" -->
where the command is a special SSI command, the cmd_argument is
related to the SSI command, and the argument_value is based on
the cmd_argument.
There are six commands for SSI. They are config, echo, exec, fsize,
flastmod, and include. Their functions are listed in Table 11.1.
Table 11.1 SSI Syntax
Command | Function
|
config | This sets the format of the size, time, or error messages.
|
echo | This will place the value of the SSI variables into an HTML document.
|
exec | Will execute a CGI program or system command, and output the result into the HTML document.
|
Fsize | Places the size of the file in the HTML document.
|
Flastmod | Places the date of the last modification of an HTML document into that HTML document.
|
Include | Places the contents of other HTML documents into the HTML document, or specified data file.
|
When working with SSI commands, it is important to remember these rules:
- SSI syntax is based on UNIX commands, which are case-sensitive.
Config is not the same as config.
- SSI requires the right file extension if it is to be recognized
and parsed by the server. Make sure your SSIs are .shtml. You
can also turn parsing on for all documents on your server, or
set the file's attributes, like the execute or archive bit. You
also must make sure that your server is aware of this file type
by associating the extension using File Manager.
- There are no spaces in the beginning of an SSI line of code.
It should always be <!--#command.
- There should always be one space after the argument_value,
as "jazzquotes.html" -->.
- The argument_value should always be surrounded by double quotation
marks.
- SSI recognizes only path names that start either at the server
root or are in a subdirectory of the directory where the SSI file
is found. Do not use any backslashes in the path name.
The command, config, short for configuration, does not appear
in HTML documents. This command is used to change the expression
of your other SSI commands as they appear in your HTML documents.
With the config command you can control the standard text output
of any SSI command. For example, if you wanted to change how the
date is sent back to the user from this format-Monday, May 11
10:32:43 EST 1996-to one more user-friendly, you would do this
by modifying the flastmod command. You can also modify the error
message that is sent out and the way the file size is formatted.
If you want to change the date, you use the command_argument timefmt
in the SSI command. For the argument_value you can use any of
these:
- %a-Abbreviates weekday name, based on present locale.
- %A-Gives full weekday name, based on present locale.
- %b-Abbreviates month name, based on present locale.
- %B-Gives full month name, based on present locale.
- %c-This is the preferred date and time display for the present
locale.
- %d-Decimal number from 0 to 31 that represents the day of
the month.
- %H-Decimal number from 00 to 23 that represents the hour of
24-hour measured time.
- %I-Decimal number from 01 to 12 that represents the hour of
12-hour measured time.
- %j-Decimal number from 001 to 366 that represents the day
of the year.
- %m-Decimal number from 1 to 12 that represents the month.
- %M-Decimal number from 00 to 59 that represents the minute.
- %p-Gives a.m. or p.m. based on the time value, or the corresponding
strings for the present locale.
- %S-Decimal number from 00 to 59 that represents the second.
- %U-Decimal number from 1 to 52 that represents the number
of the week of the current year. It begins with the first Sunday
of the first week.
- %w-Decimal number from 1 to 6 that represents the days of
the week, beginning with Sunday.
- %W-Decimal number from 1 to 52 that represents the number
of the week of the current year. It starts with the first Monday
of the first week.
- %x-This is the preferred date and time display for the present
locale, minus the time.
- %X-Gives the preferred time configuration based on the present
locale, minus the date.
- %y-Decimal number from 00 to 99 that represents the year,
excluding the century.
- %Y-Decimal number from 00 to 99 that represents the year,
including the century.
- %Z--Gives the time zone, name, or abbreviation.
These command argument values could be used as such:
Today is <!--#config timefmt="%a" --> <!--#echo var="DATE_LOCAL" -->
or
You accessed this page on hour <!--#config timefmt="%H" --> <!--#echo var="TIME_HOUR" -->
It is important to include the appropriate echo command for the
server to return the desired response to the client.
The include command is the most basic of the SSI commands. It
is most commonly used to add files to HTML that are needed in
a variety of places. This eliminates the need to cut and paste
that data each time by the use of the one line SSI command instead.
There are two command arguments: file and virtual. File indicates
any file in the current directory, or in a subdirectory of the
current directory. Virtual indicates any file that is originated
at the root directory. The argument values for each are the actual
path and file names, like this:
<!--#include file="/addresses/ad_mailing.html" -->
which automatically adds my mailing address to the HTML documents
that need it.
The virtual command argument causes the server to look for the
file in question in the root directory, as designated by the srm.conf
file. When using the virtual command, the path name must be preceded
by a backslash, and then the entire path name must be included.
This differs from the file command argument which cannot start
with a backslash, because it can look only in the current directory,
or subdirectory, as the .shtml file is in, and not above it.
The kinds of files you can include are not limited to text only
or HTML only files, but can be other SSI parsed files, excluding
those that include the include command argument.
The task of the flastmod command is to note when changes were
last made to a file, hence the name f(ile)lastmod(ified). As with
the include command, flastmod uses file and virtual as its command
arguments. The same rules apply to these command arguments as
to the include command. Flastmod is used to indicate to the user
the last time a file, like a Web zine, was modified, so that users
will know if the information is new to them.
The fsize command is concerned with the size of the file. This
command is handy when dealing with thumbnails of images on a home
page that lead to the larger versions. The fsize command can indicate
the size of each image, so users can decide if they have the time
to view it. This also helps with downloads.
The fsize command accepts both the file and virtual command arguments,
like the flastmod and include arguments, with the same parameters.
With the echo command on, SSI works with five command arguments.
Unlike previous command arguments, the items below are not case-sensitive:
- DATE_LOCAL-This creates the current time and date based on
the time zone indicated by the server and the server software.
The output can be modified using the command config and the command
argument timefmt.
- DATE_GMT-This is the current time and date based on Greenwich
Mean Time, the common time reference accepted on the Internet.
- DOCUMENT_NAME-This is the file name of the main document.
- DOCUMENT_URI-This is the path name and file name of the main
document. A URI (Universal Resource Indicator) can be considered
the same as an URL.
- LAST_MODIFIED-This is the time and date the main document
was last modified based on the last time the document was
saved, surprise, surprise.
The command argument used with the echo command is var. A typical
use of the echo command might look like this:
<!--#echo var="document_URI" -->
where the document_URI refers to the URI of the first document
first parsed by the server. Although there are technical differences,
you can consider a URI the same as an URL. This variable refers
to the URI/URL of the first file that sets the value for echo
variables.
When you get to debugging SSI, and the echo command, the server
will return the word (none), in brackets, when it cannot find
the variable it is supposed to echo.
The exec command deals with controlling the operating system from
inside the SSI HTML. Most of the commands regularly available
from the command line are also available to the exec command.
This makes the exec command very powerful, so powerful that, just
like SSIs, it may be turned on or off by the server.
Using the exec command, an SSI file can automatically access a
shell or execute a CGI script. Client response is not necessary
for this to happen. The various shell commands available to the
exec command allow the SSI script to use any of the environmental
variables discussed earlier.
The command argument for the exec command is "cmd."
The argument variables available to exec are all the arguments
available to the current shell. The many options available to
you are best utilized when you have a greater understanding of
UNIX and the shells it uses, like the Bourne or Korn shells. The
most important shell to learn is the one you may have on your
server. The exec command can also be used with CGI and Perl.
To use the exec command with CGI, the command argument "cgi"
is used instead of "cmd." This allows you to execute
a CGI script inside SSI. One drawback is that the SSI still needs
the CGI script to create its own headers, so an NPH-CGI script
(non-parsed header) will not work. This is why you should not
use NPH-CGI scripts in any SSI files because the NPH-CGI script
will not generate the necessary header for the SSI file. Without
the header, the client will be unable to use, or view, the returned
file.
This last tip about SSIs is related to speed. In the various descriptions
of the commands you may have noticed the server starts looking
in the immediate directory, then proceeds down from there. To
speed things up you can place the SSI files in the same directory
as the .shtml file that calls it, and not in a subdirectory.
The basic model of how the CGI works is fairly straightforward.
When a user's browser, called the client, contacts your server,
it may ask for a special, non-HTML file to be accessed. The server
then accesses this file and returns any results to the client.
Remembering our HTML form document from Chapter 10, where
each element of an HTML form was demonstrated, it would be nice
if we could apply that to the CGI. In a very simple way we can.
This next HTML document does not lead to another HTML document,
but uses CGI to call the page from your CGI bin, or "cgi-bin"
as you will come to know it in your scripts and directory trees.
The HTML tags are self-explanatory, and the document uses the
METHOD=GET command to pass data to your CGI script.
<HTML>
<HEAD>
<TITLE>The Submission Page</TITLE>
</HEAD>
<BODY>
<H2>Press this button and submit to me!</H2>
<FORM Method="GET" Action="/cgi-bin/submit.pl">
<INPUT type="submit" value="Total Submission">
</FORM>
<HR NOSHADE>
</BODY>
</HTML>
This produces something like the screen in Figure 11.3.
Figure 11.3 : The submission page- passing data to the
CGI.
In submit.html, the user selects the submit button which tells
the server to call the file in the cgi-bin named submit.pl. When
the server looks for this file, this is what it finds:
#! /usr/bin/perl
# submit.pl
print "Content-type: text/html\n\n";
print <<'eop';
<HTML>
<HEAD>
<TITLE>Total Submission</TITLE>
</HEAD>
<BODY>
<H2>Thank-you for submitting!</H2>
We look forward to your future submissions.
</BODY>
</HTML>
eop
which looks like Figure 11.4 when it reaches your browser.
Figure 11.4 : A Web page created from a CGI script.
This is a good time to touch on some of the elements that you
see in submit.pl. The first is the name itself: In Perl, the files
are best named in lowercase, followed by the extension ".pl."
You could use ".cgi" here as well, but the consensus
among CGI programmers seems to be that if you're serious about
your CGI programming, which of course you are, then it is better
to signify what language you are using for your CGI script in
the name of the file.
The next thing to discuss is the first line of the Perl script
-- #! /usr/bin/perl
This tells the server reading the file that this is a Perl script
and where it can find Perl, so it can deal with this program.
The "#!" is a special use of the two characters in Perl
that are interpreted by the shell of a UNIX system as the executable
for the following script. In NT this is not the case, but the
convention is so deeply ingrained in Perl scripting convention
that you will most likely see all Perl scripts with this opening
line. The "#!" will be valid only in the first line
of the script, then only the "#" symbol is necessary
for marking comment lines in Perl, like the second line-# submit.pl.
This is the name of the file, and it is good programming technique
to always put the name of your file somewhere in here near the
top of the script.
The next line uses the Perl command print. This is the standard
tool for getting Perl to output data. The data that is output
is the MIME header information, which tells the server to create
the proper header for an HTML text file. The next line uses a
programming trick that makes use of Perl's << command, which
tells Perl to print everything that follows the << command,
that is, eop (which is short for "end of perl") until
it encounters the eop tag again. Perl makes a lot of sense, doesn't
it?
Before you get too excited, maybe we should try to create an HTML
form that actually takes user data and passes it to the CGI. The
example above might as well have been a static HTML page for all
the trouble it took.
One of the most common needs of a Web site is to have a form that
gathers mailing list information. If you wanted to, you could
create a form with a single textbox to get this information, or
even easier, put in an e-mail tag and a request for the user to
e-mail you his or her address. Although these are the easier ways
to gather the data, it limits you on your options on the back
end. It would be nice if this information could be sent into a
database where a mailing list can be created based on city, or
zip code. To do this, each piece of information needed must be
input in its own field. An HTML form that asks for this might
look like the following:
<HTML>
<HEAD>
<TITLE>Your Address Please</TITLE>
<BODY>
<CENTER><H2>Your Address, Please!</H2></CENTER>
<HR NOSHADE>
<FORM Method=GET Action="/cgi-bin/nph-address.pl">
<CENTER>
<TABLE Border=0 width=6Ø%>
<CAPTION Align=top>
<H2>What's Your Address?</H2></CAPTION>
<TH Align=left> First Name
<TH Align=left colsspan=2> Last Name <TR><TD>
<INPUT type=text sixe=10 maxlength=2Ø name="first">
<TD colspan=2>
<INPUT type=text size=32 maxlength=4Ø name="last"><TR>
<TH Align=left colspan=3> Street Address <TD><TD><TR>
<TD colspan=3>
<INPUT type=text size=61 maxlength=61 name="street"><TR>
<TH Align=left> City
<TH Align=left> State
<TH Align=left> Zip Code <TR>
<TD><INPUT type=text size=2Ø maxlength=3Ø name="city">
<TD><INPUT type=text size=2Ø maxlength=3Ø name="state">
<TD><INPUT type=text size=7 maxlength=1Ø name="zip"><TR>
<TH Align=left colspan=3> Telephone Number <TR>
<TD colspan=3><INPUT type=text size=15 maxlength=15 name="phone" value="999.999.9999"> <TR>
<TD width=50%><INPUT type="submit" name="address" value="Send In Your Address">
<TD width=50%><INPUT type="reset" value="Reset this Form"><TR>
</TABLE>
</CENTER>
</FORM>
</BODY>
</HTML>
This gives you a page like that shown in Figure 11.5.
Figure 11.5 : An address form to gather user data for
the CGI.
All of the data from the form is URL encoded into name/value pairs
and attached to the end of the URL of that page, as is regular
procedure using the GET method. At the server end, this data is
put into QUERY_STRING, the environmental variable that handles
this kind of data. This data is also referred to as the query
string. A sample query string from this form might look like this:
QUERY_STRING first=Bobby&last=Hull&street=1Ø63+Golden+Jet+Lane&city=
Pointe+Anne&state=Ontario&zip=CHI+BLA&phone=61Ø.555.117Ø&address=Send+In+Your+Address+
This query string would appear right after the ? in the URL of
the form's action argument-/cgi-bin/nph-address.pl.
As has been discussed previously, memory management is an ongoing
concern of any Web server administrator. Any opportunity to reduce
the work your Web server has to do to fulfill client requests
should be taken. With that in mind, non-parsed header (NPH) CGI
scripts can help lessen your server's work load.
NPH scripts are used to create headers that are not parsed by
the server, as their name would indicate. Remember the address
request HTML form? There was a program call through the CGI for
nph-address.pl.
When data is passed to the CGI, it creates a header that tells
the server the context of the data, and then sends the requested
data itself.
Next the server has to create a response header to send to the
browser. With this NPH, you can skip the stage between the CGI
and the server, because it designates what the client is to do
with the data within the data itself. Cleaning up the long string
that is sent back to the user attached to the URL (making it a
very ugly, long string) is a way to add some class to your Web
page and using an NPH script that could do that might look like
this:
#!/usr/bin/perl
# nph-address.pl
print<<"eop"
HTTP/1.Ø 2Ø4 No Content
eop
The response header specification is given with the "HTTP/1.0
204 No Content" line, where the value "204" informs
the browser that there isn't any data to load with this response
header. Sometimes, a confirmation HTML page is sent to the user,
but this small script will accomplish the same thing and save
time.
When the script is run, the browser is informed to let the current
HTML document stay displayed. It is important when working with
NPH that you use the nph- prefix in the file names, and no other
variations, because these will only cause you CGI grief.
When dealing with the CGI, Perl is one of the premier languages
to run data to and from the Web pages on your server. The client/server
dynamic is moderated by MIME specifications that inform both client
and server what kind of data is being passed between them. Both
SSIs and NPH are features that can enhance the CGI, each with
different results.
There is much more to the CGI than is covered in this book. For
more information on the CGI, there is the e-mail mailing list
CGI-L Common Gateway Interface list <CGI-L@VM.EGE.EDU.TR>, which can be
subscribed to by sending the message "subscribe" to listserv@VM.EGE.EDU.TR.
This list deals with the many issues involved with the CGI that are not
covered in this book. More information can be found at this URL: http://
www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/CGI_Common_Gateway_Interface/
or the NCSA site
http://hoohoo.ncsa.uiuc.edu/cgi/
as well as the CGI library at
http://www.bio.cam.ac.uk/cgi-lib/
which has many Perl scripts to work the CGI. There is also this
site
http://www.city.net/win-httpd/httpddoc/wincgi.htm
which specializes in Windows CGI concerns. Learning more about
how your server uses the client/server model and how each element
of the CGI is regulated on your server will help you to write
better CGI scripts, as well as providing better access, and service,
to the users of your Web sites.