Chapter 10
The Common Gateway Interface
CONTENTS
With the HTTP protocol, web browsers have access to several Internet
services, but not to all of them. On its own the browser is also
limited in its ability to deal with anything more than static
HTML files. One of the ways to bypass this limitation is to use
a gateway. A gateway provides a client with an interface
that makes files and extension services appear as readable HTML
documents. This gives the user the ability both to access other
services on your Web server, and to input data to the Web server
through HTTP.
NOTE |
The only kind of relationship that the CGI is interested in when dealing with Web sites and related structures is the client/server model of computer communications. The computer that makes requests is the client, and the computer that answers these
requests is the server. Web sites are stored on server computers, or machines
|
Gateways sit on the server where they take a client's user input
and then output data to the client in a usable format, like an
HTML document or URL. The gateway itself does not deal with satisfying
the client's request, but finds the files, programs, or scripts
on the server that can.
One of the ways to get more dynamic pages is to use Server Side
Includes (SSIs) in your Web sites. These are different from the
CGI, and will be explained in the next chapter. SSIs can work
with-or without-a gateway.
The CGI is the specification for the way in which a server's gateway
communicates with the Web server. When data comes in from a client's
Web browser (discussed later in this chapter under HTTP Headers)
that contains a query or an HTML form using the GET or POST request
methods, then the Internet service, or "inets," starts
up the http service, or https, to deal with the HTTP data that
is arriving. The https then sends a message using the CGI specifications
to the server's gateway program. The gateway receives this data
from the browser either as standard input, or as environmental
variables.
Using this data, the gateway initiates whichever response is necessary
by parsing and processing the client data. Parsing is the
procedure the computer puts the data through, figuring out all
its syntax and storing the variables, if necessary, so it is ready
to run.
This output goes back to the https as HTML, or some other data
format that HTTP can handle; then the https sends this on to the
client's Web browser. There may be no response from the gateway
if the data it has received is only for storage or input to a
database or file folder on the server.
Before going into the interior of CGI (or, at least, the epidermis),
a quick look at some of the ways your server organizes its information
will shed some light on how CGI operates with your server. To
work effectively, CGI has to know where things are. And to enable
various functions, keep track of user access, and debug your CGI
scripts, you have to know where things are. These are all organized
in various common directories on your server. You are never restricted
to the directory names used here, but these are the commonly used
file names for directories performing the purpose outlined.
NOTE |
One common confusion when dealing with a Windows NT-based server occurs because most resources dealing with CGI and other networking concerns are written from a generic UNIX background and use generic UNIX terms and concepts that do not translate easily
(or sometimes at all) to a Windows NT format
To make using your system easier, this book uses the proper Windows NT terminology where possible, as well as translating UNIX terms you will encounter in other networking books. For example, consider the term daemon, which is commonly used in most
networking texts. Daemons are programs that deal with providing network services to clients. With Windows NT, daemons are simply called services, or sometimes, network services, and sometimes just servers. This should not be confused with the computer that
hosts these services, also commonly called a server. Little details like these can cause you hours of grief when you are dealing with the Internet, so be sure to keep this terminology issue in mind.
|
Your NT server works by relying on several things, one of which
is a well-organized directory system. Proper data management is
crucial to the swift operation of a server. If your services cannot
find the data requested of them, they cannot work properly. A
badly organized server that forces the various services to go
through a series of directories and subdirectories before they
can find what they are looking for is just as troublesome. The
following directories are used by the server to operate the CGI
programs. These are all directories to which you must have access
for successful gateway programming. If you are not also a system
administrator, you will need to discuss these issues with the
person who is.
The Server Root
Many files that determine where CGI programs can operate and what
they are allowed to do are stored in the server's root directory.
That directory is usually found on the C: drive on the server's
computer.
Inside the server's root there are two important areas: the Log
directory and the Registry directory. If you don't have access
to these directories you can still use Perl scripts, but you will
have to consult the system administrator for any additional information,
or special access, that you may need.
The Log directory is where all the log files are stored, including
logs on errors, security, system, and applications. The most important
of these files are the error logs, where all the errors involving
your HTML documents, CGI programs, and SSIs (Server Side Includes)
are noted. The error logs are necessary for debugging your system.
You can view these logs in Windows NT using Event Viewer.
On the Windows NT server, the security log is the log that keeps
track of who is using (and has used) what on your server. This
log is very useful when tracking the hits your various Web pages
receive. The security log employs two icons when used by Event
Viewer to speed up your search. A key is used to symbolize a successful
action, and a lock signifies an unsuccessful one.
There are two other logs that Windows NT runs as part of its server
setup. These are the system log, which records all events that
occur in the Windows NT system, and the application log, which
records events that occur during application runs on Windows NT.
Error calls affecting applications can be found in this log.
You can have logs that keep track of any kind of information.
If you have a complicated site that makes use of a lot of the
newer HTML tags created by Netscape (like <TABLE>), or you
are using Java, you may want to have alternate HTML documents
for browsers, such as Mosaic, that do not support the tags that
Netscape does.
You can use a log, in conjunction with a Perl script, that counts
the different kinds of browsers accessing your server. This data
can be used to reorganize your site so that you can direct each
different browser to the Web pages that were especially built
for that browser.
The Registry directory houses configuration and initialization
data from the Registry that is controlled by using the applications
Registry Editor, Control Panel, User Manager, or File Manager.
The data in the Registry sets up what can happen, like permissions,
and how it happens, like using environmental variables, on your
server. Permissions are controlled by using the "Permissions"
options under the Security menu in File Manager. You can turn
on SSIs, so that your Web pages can take advantage of their functions,
and also inform your server about new file extensions not covered
by MIME specifications, such as the x-parsed-html-type, by using
the Associate option under the File menu in File Manager. It is
important to keep your server up-to-date with new MIME specifications
so that these different file types will be handled properly on
your server.
Some of these new file extensions are used to determine which
files your server will search for SSIs. Files with the extension
.shtml will be parsed by the server as SSIs. Also, you must check
what file types your server allows by examining the file name
extensions recognized by your system. You can do this by using
Registry Editor and looking in the HKEY_LOCAL_MACHINE / SOFTWARE
subtree in the Classes directory folder. A full listing of the
file names recognized by your system is listed by its name, that
is, the text file name information is stored in a folder titled
.txt.
Microsoft recommends that you use read-only when using Registry
Editor, and then make any changes through the appropriate Control
Panel application or through File Manager, each of which has the
proper procedure to make changes to the Registry already built
in. To add the Perl file name extension, .pl, you can use the
Associate feature in File Manager, which is found under the File
menu.
Conversely, you could have a very well laid out plan for your
memory management that restricts the parsing of certain types,
but your server might be set up to parse all documents-thus hindering
all of your hard work. Information about the commands that affect
each directory tree is found in the HTML form tag here:
<DIRECTORY directory_path>
where the directory_path is all the files and directories included
under that directory tree. The command set shown above is
ended with the </DIRECTORY> tag.
To control the data in your log, use the tools that Event Viewer
contains, like the sorting or filtering options. You can sort
to list the order of the entries, or events, from oldest to newest
or newest to oldest, by choosing one of the options under the
View menu in Event Viewer. Oldest to newest is the recommended
setting.
The size of event logs can become a problem very quickly on a
busy server. Filtering helps to reduce this problem. When events
are filtered, you determine the start and end of when events are
listed and the types of events logged. The log can start at the
first entry, or be assigned a specific date. The log can be listed
until the last event, or have a specific end date. It is recommended
that you choose a specific period of events to view because the
entire list can become very long, very quickly.
The different types of events include:
- Information-for descriptions of successful infrequent significant
events involving NT server services.
- Warning-for noncritical errors. These are important to note
future problems.
- Error-for critical errors that cause data loss or major functions
to fail.
- Success Audit-for the audit events of successful action executions.
- Failure Audit-for the audit events of failed action executions.
To look for specific events, Event Viewer uses the Find function
under the View menu. The parameters here are very helpful for
locating a specific type of log event, whether it be an error
for which you are looking, or to determine whether an application
launched successfully.
You can save, or archive, any of these event logs in Event Viewer,
where they will be stored using the .evt file extension. Logs
are stored with the following fields of data: Date, Time, Source,
Type, Category, Event, User, Computer, and Description. These
fields are stored as comma-delimited files, which means that you
can import this data into most spreadsheet and database programs.
Use the Registry Editor to control your configuration files. This
is the NT application that edits the Registry, which is where
the NT stores all of its configuration files. It is recommended
that when you use the Registry Editor you convert it to a read-only
format by selecting the Read Only command under the Options menu.
This allows you to view all the data in the various configuration
files without the fear of accidentally overwriting crucial data.
You can usually find what you are looking for in the HKEY_LOCAL_
MACHINE subtree directory in the Registry. This is where your
hardware, software, security, system, and related configuration
files are kept. Another is the HKEY_CLASSES_ROOT where the different
file formats and data types are defined.
The Document Root
The directory tree you will likely find yourself in most of the
time, once you've gotten most of your bugs out of the way, is
the document root. This is where you keep all the HTML documents
for a Web site available for client access. All the directories
contained within the root directory are considered part of the
document root.
The root directory for your Web site might be
c:/HTTP/bin/my_site
and the document root for the HTML file index.htm in my_site would
then be
/HTTP/bin/my_site/
The Common Gateway Interface is one way for a Web server, using
HTTP, to "talk" to the operating system or the server's
machine. It works using requests from the client that are either
in standard input, <STDIN>, or environmental variables.
Because of this the CGI can go further than the slower HTML link,
which answers one client request at a time, leading to only one
specific response at a time.
Instead, the CGI can permit the Web server to provide different
documents based on the client's requests. More than this, the
CGI permits totally new documents to be written "on-the-fly"
so that customized client responses can be made. Typically, the
user inputs his or her information via an HTML form. Before discussing
that subject, however, a quick examination of HTTP headers is
in order. A closer look at the headers will give us some clues
as to how the CGI deals with data. MIME specifications for these
headers are outlined in Appendix B.
The MIME specifications mentioned earlier, and explained in depth
in Chapter 11, are used to create HTTP headers that let the client
and server know what kind of data is being transferred between
them. From the client, HTTP sends a request header based on the
instructions found in the HTML file. The two basic methods to
retrieve data from a server are "GET" and "POST."
The default method in HTTP is GET when a request method is not
specified in the HTML document.
When GET is used, the information is sent to the server via the
URL field. If POST is used, then the data is sent as a separate
message once all the other HTTP request headers have been sent.
When the client has determined the method it will use to send
the data, it builds an HTTP header to send to the CGI program
on the server. This message is sent to the server, and there the
CGI program in question is called up by the server. You are not
restricted to sending only one header; you can also include other
headers that contain additional information for the server or
the CGI program.
The CGI program called up by the request then performs the task
requested of it, taking commands from any form data present, and
sends a message to the server concerning what kind of message
should go back to the client. Between the two of them, the server
and the CGI program, various HTTP response headers are created
and sent to the client.
One of the ways the CGI program accomplishes this is by referring
to itself as a non-parsed CGI program, or NP-CGI. This allows
the response headers it creates to be sent straight through the
server, simplifying and speeding things up by eliminating unneeded
processing, or parsing, time. The other way a CGI program sends
data is by creating only the minimum response headers necessary
(usually Content Type headers) and sending them to the server
where they are parsed.
NOTE |
Parsing is the term used to describe the process that your computer goes through when preparing a program file for execution. When a computer parses a file, it goes though the file line-by-line, examining the syntax and looking for useful instructions
that will cause it to do some task when the program is run.
|
Parsing a file can cause problems in HTML files, which are not
meant to be parsed. When your computer reads these files, it could
find all manner of instructions not meant for execution that could
cause your computer to act up, or even crash.
Once all this is finished, the server will then decide if any
additional headers need to be added to the response, and then
sends it all to the client. This Content Type header is a common
header that contains the file type of data being sent in between
client and server.
You should have a strong understanding of HTML form specifications,
but the HTML form is the main way in which users will be passing
information to your server, so we will go over the details involved.
For a really in-depth tutorial on HMTL forms try
http://www.netscape.com/tutorials/forms.html
HTML forms start with the <FORM> tag. To handle specific
data the <INPUT> tag defines how the data is gathered from
the user on the page. The <SELECT> tag presents a choice
to the user for data, like a multiple choice question on a test.
The <OPTION> tag is used to present each of the choices
the user has. It is used with the <SELECT> tag. And, if
the user is inputting text, the <TEXTAREA> tag creates a
pane that will hold the user's data. This pane is scrollable.
Each of these HTML elements is modified by it own attributes.
The HTML form itself sets up paired variables of name fields with
value fields. The name variable is determined by the form, which
is matched to the value variable supplied by the user. Once the
user has supplied the information, the form has to understand
what to do with the data. This is accomplished with the method
and action attributes in the <FORM> tag.
There are several attributes in the <FORM> tag that deal
with how the form will handle the data. These include
- Action-Where the URL of the program that needs the
input of the form is given. The default URL for this is the base
server URL where the form is located.
- Method-This denotes the method type in which the supplied
data will be sent to the URL indicated in the Action using the
proper forms-handling protocol. The two choices of method type
are GET and POST.
- GET is typically used when the information being supplied
makes no "lasting" changes in the server's HTML documents
or databases. POST is used when changes are made to the
server's HTML documents, databases, or some other value.
- Enctype-This is important. Enctype is used to give the data
its media type so that the name/value pairs will be properly encoded.
If the protocol recognized in the <METHOD> tag does not
have its own format defined, then it must be assigned using Enctype.
The format for this is
application/x-www-form-urlencoded.
Name/value pairs are the way in which data values are passed to
the CGI from a form. The "name" comes from the name
assigned by the programmer in the tag requesting input. This is
paired with the input from the "value" in the same tag,
which is given by the user.
The name/value pairs will be included in the data set in the order
in which they appear in the form. The name fields are separated
from the value fields with an = symbol and the white space in
both the name and value variables is replaced with a + symbol.
They are sent to the server as name=value, with each pair of name/value
pairs separated by an & symbol. The format looks like this
name1=value1&name2=value2&name3=value3
or, with a real example
first=Bobby&last=Hull&street=1Ø63+Golden+Jet+Lane&city=Pointe+Anne&state=
Ontario&zip=CHI+BLA&phone=61Ø.555.117Ø&address=Send+In+Your+Address+
You may have noticed how long this can make the string being sent
to the server. It is important to be aware of how your server
handles long strings, so that information is not chopped off.
The POST method has no limit, as it is just a continuous string
of DATA from <STDIN>, like typing on a keyboard. If you
type fast enough, your input gets stuck in a buffer. The GET method
uses environmental variables, and it is limited to 255 characters.
These name/value pairs can be separated out using Perl. So, with
name/value pairs, the name is how your server recognizes arriving
data, while the value of that data is the value of the pair. The
name/value system applies to all the types of data submission
a user can make, from text entry to checkboxes to radio buttons.
All nonalphanumeric characters are replaced by a % symbol followed
by the two hexadecimal digits that represent their ASCII code
equivalent. Line breaks are signified as control/line feed %0D%0A.
The nonalphanumeric characters most often used from your keyboard
are symbolized by their decimal and hexadecimal equivalents, which
are found in Table 10.1.
Table 10.1 Standard ASCII Characters and Their Decimal
and Hexadecimal Equivalents
Character | Decimal
| Hex |
Tab | 09
| 09 |
Space | 16
| 20 |
" | 18
| 22 |
( | 40
| 28 |
) | 41
| 29 |
' | 44
| 2C |
. | 46
| 2E |
; | 59
| 3B |
: | 58
| 3A |
< | 60
| 3C |
> | 62
| 3E |
@ | 64
| 40 |
[ | 101
| 5B |
] | 103
| 5D |
\ | 102
| 5C |
^ | 104
| 5E |
{ | 113
| 7B |
} | 115
| 7D |
| | 114
| 7C |
~ | 116
| 7E |
There are other non-alphanumeric characters that can be encoded,
as shown in Table 10.2.
Table 10.2 Non-Alphanumeric Character Encoding
Character | Encoding
|
? | %3F
|
& | %26
|
/ | %2F
|
_= | %3D
|
# | %23
|
% | %25
|
The specifics of MIME/URL encoding can be found in RFC 1552, section
3 at
http://ds.internic.net/ds/dspg1intdoc.html
The tag that uses the POST method might look like this:
<FORM Method="POST" Action="http://www.my_server.com/cgi-bin/register.pl">
To collect specific information from the user, the <INPUT>
tag is used. The various attributes for this are found in Table
10.3.
Table 10.3 Form Tag Input Options
Form Tag | Purpose
|
Align | Used when an image is employed in gathering the data. The choices are "top," "middle," and "bottom," which define the relationship the image has to the text following it.
|
Checked | Presets a checkbox to include a checkmark. If this attribute is not included, the checkbox will be blank.
|
Maxlength | Sets the maximum number of characters a user can input as text into a field. The default is unlimited, so you might want to restrict the length in every form, or risk your scripts becoming
overwhelmed by a flood of text.
|
Name | The symbolic name used when transferring and identifying the output produced by your form.
|
Size | Defines the field width of the text box presented to the user. When the Size is less than the Maxlength, then the text field will be scrollable.
|
Src | If an image is used, this identifies the source of the image file.
|
Type | The kind of input format the user sees is defined. The choices are checkbox (the user can make multiple choices for data values), hidden (the values are defined by the form, not by the user), and image
(the user selects an area of the image, then the x and y coordinates are sent with the name/value pairs).
|
Password | User supplies text that is hidden from view on the user's screen. Typically this appears as asterisks or dots.
|
Radio | User must choose one selection only from a list. This should not be confused with checkboxes where the user can select any and all of the choices presented.
|
Reset | Clears the form of all selections for re-entry by the user.
|
Submit | Used by the user to submit the form data to the server.
|
text | Attribute that uses the Size and Maxlength provisions to create a single line field for user input. Text is a single line text input field. If this is the only area for user input, then a submit button is
not required. Simply pressing the Enter or Return key on the user's keyboard will send the data on. If more than a single line is needed the <TEXTAREA> tag should be used.
|
Value | Used with the radio button, this sets the value for the selection available to the user.
|
Using these tags we can create an in-depth form that asks for
user background data and tastes, which the CGI can then enter
into your databases. Some of these tags were used to gather information
in our guestbook script. After these tags are illustrated with
examples, they are combined to a full form page that can be adapted
to gather user data on your site.
<INPUT Type="hidden" Name="address" Value="new_user@my_server.com">
Your Name: <INPUT Type="text" Name="user-name" Size="2Ø" Maxlength="3Ø">
<H2>Guess the secret word contest</H2><BR>
Try and guess our secret word to win a prize!<INPUT Type="password" Name="word_guess">
Where did you hear about out site?:<INPUT Type="radio" Name="Internet" Value="online">The Internet
Please check all the mediums you use:<INPUT Type="checkbox" Name="television">
Where are you from?<INPUT Type="image" Src="http://www.my_server.com/images/map.gif"
Name="user_location" Align="top">
Do you like our site?:<INPUT Type="radio" Name="site_feedback" Value="Yes" Checked>Yes
<INPUT Type="radio" Name="site_feedback" Value="No">If not, then what don't you like?
<INPUT Type="text" Name="user_suggestions" Size="60" Maxlength="1ØØ">
<INPUT Type="Submit" Value="Send it in!">
<INPUT Type="Reset" Value="Do it again!">
To allow the user to choose from a list of options on a form,
the <SELECT> tag is used. Although the default for selection
is only one choice for the user, this can be modified using the
Multiple attribute. The <OPTION> tag is used to define each
choice available to the user. The attributes work as follows:
- Name-A name is assigned to be associated with the data submitted
by the user.
- Multiple-Allows the user to make more than one choice in the
<SELECT> tag.
- Size-Identifies how many choices will be available to the
user. If thi value is more than one, the choices are
presented as a list.
The <OPTION> tag is used in tandem with the <SELECT>
tag. It has two attributes:
- Selected-Marks a choice as already selected.
- Value-This is the value that is sent. If the <SELECT>
tag is selected, then this is the value that will be returned;
otherwise, the default is the value set by the <OPTION>
tag.
Typically these two tags might look like this:
Please choose one of our products as a gift:
<SELECT Name="product_gifts">
<OPTION>Lead Pencil 2ØØØ
<OPTION>Staple-O-Matic!
The last tag available in creating forms is <TEXTAREA>,
which is used to define the size of a text field for user input.
This field is scrollable. The attributes that apply here are
- Name-Logical name linked to the data supplied here.
- Rows-Number of rows in the field.
- Cols-Number of columns in the field.
This creates a better format for a text area in which to input
user feedback than the previous example, which created only a
text line. To change this, we can use a <TEXTAREA> tag that
looks like this:
<INPUT Type="radio" Name="site_feedback" Value="No">If not, then what don't you like?
<TEXTAREA Name="user_suggestions" Rows="4" Cols="5Ø"><BR>
</textarea>
When you combine all these tags, you get a better understanding
of how the form works. This example combines the previous examples
in a form that presents a sample of each of the elements. It is
for collecting information about new users of your site. Screen
representations of what the form would look like is shown in Figure
10.1 right after the script.
Figure 10.1 : Sample of different input features
on an HTML form.
<HTML>
<! - - Example of form elements and attributes - - >
<HEAD>
<TITLE>
The New User Profile Form
</TITLE>
</HEAD>
<BODY>
<P>
We Want to Know More About You!<BR>
<HR>
<BR>
<FORM Method="POST" Action="http://www.my_server.com/cgi-bin/register.pl">
<P>
<INPUT Type="hidden" Name="address" Value="new_user@my_server.com">
Your Name:
<INPUT Type="text" Name="user-name" Size="2Ø" Maxlength="3Ø">
<INPUT Type="hidden" Name="subject" Value="new_user_info"><BR><BR>
Where did you hear about out site?:
<INPUT Type="radio" Name="where" Value="online">The Internet
<INPUT Type="radio" Name="where" Value="television">On Television
<INPUT Type="radio" Name="where" Value="friend">A Friend<BR><BR>
Please check all the mediums you use:<BR>
<INPUT Type="checkbox" Name="television">Television<BR>
<INPUT Type="checkbox" Name="internet">The Internet
<INPUT Type="checkbox" Name="radio">Radio
<INPUT Type="checkbox" Name="print">Magazines and Newspapers
<BR><BR>
Please choose one of our products as a gift for fillin gout this form:
<SELECT Name="product_gifts">
<OPTION>Lead Pencil 2000
<OPTION>Staple-O-Matic!
<OPTION>Glue Master Sticky Tape
<OPTION>Log!
</SELECT><BR><BR>
Please choose your favourite Web browser:
<SELECT Name="browsers" Multiple Size=4">
<OPTION Value="straight">AOL
<OPTION Value="straight">Explorer
<OPTION Value="hip">Navigator
<OPTION Value="hip">Mosaic
<OPTION Value="weird">Lynx
</SELECT><BR><BR>
<H3>Guess the secret word contest</H3><BR>
Try and guess our secret word to win a prize!<INPUT Type="password" Name="word_guess">
<BR><BR>
Do you like our site?:
<INPUT Type="radio" Name="site_feedback" Value="Yes" Checked>Yes
<INPUT Type="radio" Name="site_feedback" Value="No">If not, then what don't you like?
<BR><BR>
<TEXTAREA Name="user_suggest" Rows="4" Cols="5Ø">
Thanks for your thoughts!
</TEXTAREA><BR><BR>
Where are you from?<INPUT Type="image" Src="http://www.my_server.com/images/map.gif"
Name="user_location" Align="top"><BR>
Now that you're done, let us know by sending us your info.<BR><BR>
<INPUT Type="Submit" Value="Send it in!">
<INPUT Type="Reset" Value="Do it again!">
</FORM>
<P>
Thanks for registering!
<HR>
</BODY>
</HTML>
NOTE |
To effectively use an image map like the one used in our new user form example, you must define your graphic, or image map, properly. You can do this by using a Perl script, like the one for defining image maps discussed in Chapter
6 Please see that script for details
|
If you want to supply Web browsers with truly dynamic entities
that use the CGI to do more than just retrieve other static HTML
documents, like an educated gopher, then running executables from
the server's side of things is a must. The CGI must also be able
to take specific user's data, input for a specific task. Environmental
variables are one of the ways to do this.
To understand how environmental variables differ from regular
variables, it is important to know about scope. Scope refers
to the extent to which a variable is understood. Most variables
are redefined each time a program is run, most often only in a
certain block of code within that program. This is the common,
garden-variety file variable. When you get to environmental variables,
however, their value stays the same throughout each script and
application started within each CGI, or Perl window, or shell.
Their value is based upon the first document opened by the browser.
This has various implications, not the least of which is the ability
of different applications, or processes, to share the same environmental
variables across the same shell.
To illustrate environmental variables we will use some Perl scripts.
Before getting into the details of environmental variables, however,
you should know that not all environmental variables are carried
on every system. To check and see which environmental variables
are supported by your NT server, you can use System Control Panel.
The full list of System Environmental Variables is listed here,
as well as two text boxes beneath, which can be used to create
new environmental variables for your system.
You can also use this Perl script called env_var.pl to print out
the environmental variables available to your CGI scripts, as
in Figure 10.2.
Figure 10.2 : Environmental variables available to your
server.
#!/usr/bin/perl
#env_var.pl
push(@INC, "/cgi-bin");
require("cgi-lib.pl");
print &Printheader;
print "<HTML>\n";
print "<HEAD><TITLE>Environmental Variables Available to the CGI</TITLE>
</HEAD>\n";
print "<BODY>\n";
print <<"eop";
<CENTER>
<TABLE border=1 cellpadding=12 cellspacing=12>
<TH align=left><H2>Environmental Variable</H2>
<TH align=left><H2>Contains</H2><TR>
eop
foreach $var (sort keys(%ENV)) {
print "<TD> $var <TD> $ENV{var}<TR>";
}
print <<"eop"
</TABLE>
</BODY>
</HTML>
eop
This next script also can be used for determining what environmental
variables are on your server. Instead of just displaying them
on your browser, it can be more productive to have a text list
of them. You can do this by e-mailing this list to yourself using
a Perl script.
There are two environmental variables that are useful to collect
user input; QUERY_STRING and PATH_INFO. To get data into these
variables there are two methods available. Information can be
added directly to an HTML link by the programmer, as with
<A HREF=http://www.my_server.com/cgi-bin/name.pl?data-request>Click here to read the member's names.</A>
where all that follows the question mark is output into the QUERY_STRING
variable. This is true for all data that follows the first question
mark in an URL of an <A HREF> tag. Data can be input into
the PATH_INFO variable in a similar way:
<A HREF=http://www.my_server.com/cards.pl/bet=1ØØ/cards=5>Click here to start your game with $1ØØ.ØØ</A>
CGI will start up the program cards.pl and place everything after
that field into PATH_INFO.
Both of these variables can be modified by using different methods
inside a <FORM> tag. A form with METHOD=GET will place data
into the QUERY_STRING variable. An example of this might look
like this:
<FORM METHOD=GET ACTION="http://www.my_server.com/cards.pl">
First Card<INPUT NUMBER = "First Card"><BR>
Second Card<INPUT NUMBER = "Second Card"><BR>
INPUT TYPE=SUBMIT VALUE="Submit"
</FORM>
All the information the user inputs into the "First Card"
and "Second Card" prompts will be placed in the QUERY_STRING
variable. To double-check the entries, the CGI may echo back the
data with a script like this:
#!/usr/bin/perl
# cards.pl
print "Content - type: text/html\n\n";
print "You picked \"$ENV{QUERY_STRING}\" as your cards. Good choice!\n\n";
exit;
This script will not print plain converted numbers, because they
will still be encoded with spaces as "+", and so forth.
You might try a regular expression to carefully remove these extra
symbols.
In a script like this, the users' choices are shown back to them
on their Web browser. Their input has been appended to a new URL:
http://www.my.machine/cards.pl?First+Card=TenofHearts&Second+Card=AceofSpades
where the user had entered "Ten of Hearts" as her first
choice and "Ace of Spades" as her second. The CGI has
taken this input and appended it to a new Web page. Please note
that the input here has been encoded and decoded so that certain
characters, such as spaces, are translated before they proceed
to the gateway script.
"PATH" is another important environmental variable.
This is the variable that lets your CGI programs know how to find
the other programs and files it may need. When the Perl interpreter
looks for files referenced in a CGI program, it uses the PATH
environmental variable to define where it should search. PATH
is also used by your server's system to find files outside of
the CGI program it is running. Making sure that PATH is properly
defined is very important.
This is also a good place to check whether you are having problems
with CGI scripts that depend on other files for successful execution.
The different directories available to PATH are separated by a
colon. An example of a defined PATH environmental variable may
look like this:
PATH=/usr/bin/:/cgi-bin/perl/:/usr/local/public/:/bin:/perl/usr/local:
Whatever makes use of PATH starts on the left and looks in the
first directory listed, and then it proceeds down the list. To
speed up operations, list the directories judiciously and in the
order of most use to least. The period at the end of the PATH
values is not to terminate the list, but is a command to also
search the current directory where the CGI program is located.
The only real problem that exists with environmental variables
is that the gateway program could run extremely long strings through
a shell script that has built-in limitations for string lengths.
You might encounter this as the "running out of environment
space" error. To avoid this you can run your data through
standard input, or <STDIN>.
Remembering and using our METHOD=GET means of transferring data
to the gateway, standard input can be used to modify the Perl
script:
#!/usr/bin/perl
# cards.pl
$user_input = read(STDIN, $_, $ENV{CONTENT_LENGTH});
print "Content - type: text/html\n\n";
print "You picked \"$user_input\" as your cards. Good choice!\n\n";
exit;
where the output to the user would be the same as in the previous
example, except that the URL would not display the encoded QUERY_STRING
after the script name, as
http://www.my_server.com/cards.pl
Although the METHOD=GET tag is useful, the METHOD=POST tag is
even more so, because there is no restriction on the amount of
information that it can pass to the gateway program. An example
of using METHOD= POST follows:
<FORM METHOD=POST ACTION=http://www.my.machine/cards.pl/screen=subscribe>
where the user's information will go into both <STDIN> and
the PATH_INFO variable.
Overall there are several ways to get data to the gateway program,
which creates several software solving strategies to add to your
bag of tricks. To make the most of the CGI, a full understanding
of the set of available environmental variables is valuable. Environmental
variables fall into two distinct categories: server meta-information-where
the variable is independent of the client request and keeps the
identical value regardless of the client's request, and the other-which
is client-specific, and where the value is dependent on the client
request.
It should be noted that some client-specific environmental variables
can be defined by the server to which the client's request was
sent.
Server Meta-Information Environmental Variables
These environmental variables are set by the server itself, and
do not rely on the CGI to define them. They are always accessible
by the CGI. The list of meta-information environmental variables
is found in Table 10.4.
Table 10.4 Meta-Information Environmental Variables
Environmental Variable
| Value |
SERVER_ADMIN | The e-mail address of the person responsible for all the Web-related concerns on your server, which is probably yourself.
|
SERVER_SOFTWARE | Identifies the name and version of the Web server. Its output comes in the form name/version.
|
SERVER_NAME | Signifies the server's hostname, DNS alias, or the IP address.
|
GATEWAY_INTERFACE | The server CGI type and the revision level. It is output as CGI/revision.
|
These environmental variables are also known as request-header
dependent, because they rely on the requests from the client to
give them a value. The client-specific environmental variables
are listed in Table 10.5.
Table 10.5 Client-Specific Environmental Variables
Environmental Variable
| Value |
AUTH_TYPE | Used to show the protocol-specific authentication method for validating user access. This is used only if the server supports user authentication.
|
CONTENT_LENGTH | The length of the content buffer as announced by the client in its request. It is used by the CGI to know when to cut off the data stream, which it does by reading an input buffer.
|
CONTENT_TYPE | The type of content the client has queried, like HTTP, POST, and PUT.
|
HTTP_REFERER/REFERER_URL | The URL from which the script was invoked.
|
HTTP_REQUEST_METHOD | Simply the HTTP methods request header remade into an environmental variable. The values here can range from the familiar GET and POST methods, to HEAD, PUT, DELETE, LINK, and UNLINK.
|
HTTP_USER_AGENT | Identifies the Web browser that the client uses to send its request. Its output is software/version library/version.
|
METHOD=GET (POST) ACTION= http://machine/path/
programname/extra-path-info
| This was explained earlier. The supplementary data is put into PATH_INFO.
|
PATH_INFO | Where data from the METHOD=GET(POST) winds up.
|
PATH_TRANSLATED | Where the server takes the virtual path found in PATH_INFO and translates it into a physical path.
|
QUERY_STRING | The client data that follows the ? in an URL that is sourced by this particular script.
|
REMOTE_ADDR | Identifies the IP address of the client.
|
REMOTE_HOST | Where server sets the client's hostname. If this data is not supplied the server should set REMOTE_ADDR instead because this variable holds the same value as REMOTE_ADDR.
|
REMOTE_IDENT | Used for logging. It holds the remote user's name.
|
REMOTE_USER | The user's authenticated user name.
|
REQUEST_METHOD | Where the METHOD=GET(POST) information is housed.
|
SCRIPT_FILENAME | The value of the full path to the CGI script.
|
SCRIPT_NAME | Used to reference the virtual path the executable script takes. Handy for self-referencing URLs like ISINDEX queries.
|
SERVER_PORT | Identifies the port to which the client request was sent.
|
SERVER_PROTOCOL | Takes the protocol that the client is using to make its request and outputs it as protocol/revision.
|
Looking into the CGI has lead us to how the CGI handles data from
your Web pages using the HTTP protocol, MIME headers, and Perl
scripts. These early explorations provoke even more questions
about the CGI and how it works, which are presented in Chapter
11.