To learn more about author Sanjaya Hettihewa, please visit the author's homepage.
CGI unleashes the potential of the Internet by providing a mechanism for publishing dynamic content on the Internet. One of the best things about the World Wide Web is that you can use it to interact with millions of users to obtain and provide information. Due to the dynamic nature of this information, static HTML pages alone are not enough. You must provide a way to display dynamic information to those surfing your Web site. CGI is a mechanism that allows you to do just that. CGI stands for Common Gateway Interface. After you set up your Web server and create some Web pages, it's time to think about making your Web site dynamic by setting up CGI scripts on your Web server. By utilizing CGI, you can exploit the World Wide Web to its potential.
Feedback forms, e-mail forms, database query interfaces, database update mechanisms, Web-page counters, and search engines are all applications of CGI. By the end of this chapter, you will be able to develop CGI scripts, experiment with them, and harness the power of interactive Web interfaces. This chapter introduces CGI and explains how it works, then shows you practical applications of CGI and how CGI scripts can be utilized to enhance your Web site. Next, you will be shown how to develop CGI programs. C and PERL are used to illustrate how CGI programs can be created to perform various tasks. When you finish this chapter, you will be able to utilize CGI to interact with your Web site browsers. Chapter 11, "Developing ISAPI Applications," shows you how to develop Internet Server Applications Programming Interface (ISAPI ) applications and builds on material covered in this chapter. ISAPI offers a high-performance and scalable CGI application development interface.
Before you proceed, an introduction to CGI is in order. CGI is a standard used by various programs at your Web site to interact with users surfing your Web site. Because CGI is a standard, it is not browser or server dependent and can be moved from one Web server to another while retaining its full functionality.
Just like application programs, CGI programs can be written in almost any programming language that lets you create an executable program or interpret it in real time with another program, as in the case of AWK (the name AWK is derived from the initials of its designers: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan) and PERL (Practical Extraction and Report Language). Languages that can be used to develop CGI applications under Windows NT include
Depending on your expertise, what's available, and the nature of your CGI project, choose the language that best suits your needs. Customarily, CGI scripts are stored in the CGI-BIN directory of the Web server's document root directory.
Plain-text HTML files retrieved by Web clients are static. The information contained in these files never changes unless the files are manually edited. However, by utilizing CGI scripts, Web pages can be dynamically created each time a client accesses a certain URL. To the client, the page appears to have been created for him. Obviously, this is a very powerful tool for interacting with Web surfers. You should utilize CGI to make your Web site interactive so that you can provide customized content and enable those browsing your Web site to interact with the information you provide.
CGI is invaluable to any Web site. Benefits of CGI range from having a customized input form for feedback to allowing someone browsing your Web site to update and retrieve information from a database on your server. By setting up a customized e-mail feedback form, you can be sure you are provided with all the information you need. Furthermore, you can be sure that your e-mail feedback form will always work because it does not depend on how the e-mail settings of your client's Web browser are set up (in case they are not set up correctly for e-mail). Furthermore, you can use CGI to set up and update a database that collects data from users browsing your Web site. See Chapter 12, "Developing ODBC Database Front-Ends," to learn how to publish databases on the Internet. As you can see, the possibilities and applications of CGI are endless.
When you use CGI to make your Web site interactive, users visiting your Web site can easily find information they need. Because your Web site is easy to navigate, these users will visit it again and again for more information. CGI also allows you to customize what people see when they browse your Web site by providing dynamic content. Furthermore, you can use a CGI script to provide content that's customized for the Web browser being used to access the information.
Many organizations and individuals use CGI for a variety of tasks, from having a simple counter on a Web page to track the number of accesses to a CGI script to manage an entire store front-end. Such a CGI script can allow users visiting a Web site to browse merchandise and place orders online. In addition to this, some Web sites offer search capabilities to make finding information easier.
Here are a few applications of CGI that you can use to enhance the capabilities of your Web site:
Before moving to more advanced topics, let's cover the basics of CGI.
A CGI script is typically used to provide dynamic content to the client that called
the script. CGI scripts communicate with Web browsers, as shown in Figure 10.1. If
the CGI script is interactive, a form with input controls is typically sent to the
Web client. After filling in the form, the user submits it to the Web server. The
Web server then uses CGI to call the CGI script with data from the Web client. The
CGI script processes the data, possibly accessing a database on the server, and sends
a message to the client that made the request. If the CGI script is noninteractive,
its output is sent directly to the client.
Figure 10.1. Architecture of a typical
Web server with CGI scripts.
When a CGI script is called, the Web server first examines the REQUEST_METHOD used
to call the CGI script to determine how the Web client is sending data to the CGI
script. This process is shown in Figure 10.2. If the REQUEST_METHOD used to call
the CGI script is GET, any data supplied by the Web client is located immediately
following the URL name of the CGI script. This information is stored in the environment
variable QUERY_STRING. On the other hand, if the REQUEST_METHOD used is POST or PUT,
the size of input for the CGI script is stored in CONTENT_LENGTH, which contains
the size of the data supplied to the CGI script in bytes. The CGI script can then
read the number of bytes returned by CONTENT_LENGTH to examine data given to the
CGI script. If you are confused about all these strange environmental variables,
don't worry--they are all discussed in the "CGI Environment Variables"
section later in this chapter.
Figure 10.2. How Web servers determine
and handle the REQUEST_METHOD, which calls the CGI scripts.
Although a major use of CGI is to provide dynamic content to those browsing your
Web site, CGI programs do not always need to be interactive. You can use non-interactive
CGI scripts to provide dynamic information that does not need user input to be created.
For example, to take advantage of features offered by Web browsers such as Netscape
Navigator and Microsoft Internet Explorer, you can write a CGI program to determine
the browser being used by a client and send a page designed to take advantage of
that browser's capabilities. In the "Using CGI to Provide Customized Content"
section of this chapter, you will see how easy it is to write a CGI script to provide
customized content based on the browser being used to access a page. In such an event,
the CGI script does not need to interact with the person browsing the Web site. The
CGI script can be executed transparently without user intervention. For example,
if the default Web page of a Web server is welcome.html, the main Web page of the
Web server can be mapped to a CGI script by creating a URL-CGI mapping, as shown
in Figure 10.3. Such a script can determine the browser being used by the client
and display a page with dynamic content optimized for that browser. Please refer
to your Web server's documentation for more information on creating URL-CGI mappings.
Figure 10.3. Map a Web page URL to a
CGI script to provide dynamic content.
If a CGI script does not make use of user input, the events that occur when a client
accesses the page are very simple. First, the client connects to the Web server and
requests a Web page. Because the document requested is linked to a CGI script, the
Web server executes the CGI program to which the page is linked. Output of the CGI
program is then sent to the client that requested the page. Afterward, the connection
between the Web server and the Web client is closed. This interaction is shown in
Figure 10.4.
Figure 10.4. You can use a non-interactive
CGI script to provide dynamic content.
One of the greatest aspects of CGI is its capability to interact with those browsing
your Web site. You can ask a user to fill in and submit a form. The CGI script can
validate the user's input, ask the user to complete any incomplete information, and
process the user's input, as shown in Figure 10.5.
Figure 10.5. You can use an interactive
CGI script to provide dynamic content.
In the case of a CGI script interacting with a Web client to display dynamic content,
a Web page with various controls is first sent to the Web browser. After the user
fills in the form, it is submitted to the Web server for processing. Depending on
the REQUEST_METHOD used to communicate with the CGI script, the CGI script obtains
data sent from the client, processes the data, and writes its output to standard
output. Everything written to standard output by the CGI script is visible to
the client that called the CGI script.
When setting up CGI scripts, you should be concerned with a few things. Each time you allow a CGI script to be executed by someone surfing your Web site, you are allowing someone to execute a program on your server. This can lead to security breaches. Although this might sound a little perilous, it's not as bad as it sounds provided that you follow a few guidelines. When used properly, CGI is very safe.
Another issue is the time it takes for a CGI script to fulfill a client's request. If you plan to provide data to those browsing your Web site in real time, you should ensure that, at most, no one has to wait for longer than about five to ten seconds. If it's going to take longer to process a request, you should obtain the e-mail address of the person requesting the information and e-mail the information when the data is processed. If it takes longer than about ten seconds to process a request, the person waiting at the other end might think there is something wrong and simply stop waiting.
Note: If you need to provide data in real time and simple CGI scripts take longer than about ten seconds to execute, it's likely that you are outgrowing your server and need more processing power and/or RAM. This might also be an indication of a bottleneck such as an inefficient database access driver or a poorly written CGI script.
Due to the nature of HTTP, it's possible for two or more clients to call the same CGI script at the same time. If the CGI script locks files or databases when it processes data, this can cause problems--potentially causing loss of data. CGI scripts should be capable of handling such a situation without any problem. This can be done by making sure that the CGI application does not lock databases or files that might potentially be accessed by another instance of the same application.
Although CGI is a very powerful tool for making information available to those browsing your Web site, you should be particularly careful with CGI scripts that take input from a Web client and use that data (without checking) as a command-line argument. An example of this would be using an e-mail address supplied by a Web client to call Blat (a command-line e-mail program for Windows NT). When using such an e-mail address, make sure there is no possibility of it being interpreted as a command-line command. Your CGI scripts should always check for special control characters to avoid security breaches.
If you have sections of your Web site protected with a password, you might want to disable directory browsing of your Web server. By disabling directory browsing, you're preventing someone from snooping around your Web site.
You should be cautious about who has access to your Web server's CGI directory. It's very dangerous to allow users who upload files to your Web site via FTP to have access to your CGI directory. It doesn't take much knowledge in programming to write a malicious program, upload it to the CGI directory, and execute it with a Web browser. Therefore, you should control who has access to your CGI directory via FTP or any other method.
You should never set up CGI applications to distribute potentially harmful personal information unless the Web server is configured to encrypt the data before it is transmitted over the Internet. If you distribute financial information or credit-card numbers, you should not use CGI unless you have configured your Web server to encrypt data before it is transmitted. If you need to transmit sensitive data and your Web server does not encrypt data before transmitting it, consider a medium such as PGP (Pretty Good Privacy) protected e-mail to transmit your data.
If you validate users who access parts of your Web site, you should never assume the IP address returned by the Web server is the real IP address of the Web client. It's possible to trick the Web server into believing the client making the HTTP request is requesting the data from a site other than the connecting site. Even if you protect a certain area of your Web server with a password and a user ID, this data might be intercepted by a third party. Someone could intercept a valid user ID and password when a legitimate user accesses your Web site.
If your Web server supports data encryption, this won't be a problem; it will, however, if you aren't using any Web-server based encryption. In such a case, you should use an OTP (One Time Password) mechanism to validate users. An OTP authorizing mechanism works by making sure a password cannot be used more than once; it typically sends a challenge string to the client that wants to gain access. The client then uses a special program to find out the correct response string for the challenge string supplied by the server; this is done by typing the user's secret password and the challenge string. The response string is then sent to the server, which validates the user and remembers the response string so it can't be used again. The next time the user wants to gain access, the server sends a different challenge string to the client that can be decoded only with the user's secret password. Because the user's secret password never travels across the Internet, this is a safe way to authorize users. However, unless an encryption technology is used, the content being accessed by a client might still be intercepted by a clever person with too much free time.
URL:For more information on such an OTP mechanism, you might want to visit the site at:http://www.yahoo.com/Computers_and_Internet/Security_and_Encryption/S_KEY/
Each time the Web server executes a CGI script, it creates a number of environment variables that inform the CGI script how the script is being invoked. The environment variables also provide information about the Web server and the Web browser being used by the client. Depending on how the CGI script is invoked, some environment variables may not be available in some cases.
Environment variables supplied to CGI scripts are always all uppercase. When they are being accessed by a C Program or PERL script, or whichever language you are using, be sure to use all uppercase letters.
This section discusses the environment variables available to CGI scripts. By accessing these variables, CGI scripts can obtain certain information, such as the browser used to invoke the script. After the following discussion about environment variables, you will learn how to access these variables from a PERL script as well as from a C program via CGI.
Some Web servers can be configured to authenticate users. If the server has authenticated a user, the authentication type used to validate the user is stored in the AUTH_TYPE variable. The authentication type is determined by an examination of the authorization header that the Web server receives with an HTTP request.
Sometimes, CGI scripts are invoked with additional information. This information is typically input for the CGI program. The amount of this information is specified by the number of bytes. If a CGI script is called with additional information, CONTENT_LENGTH contains the amount of the input in bytes.
MIME content types are used to label types of objects (HTML files, Microsoft Word files, GIF files, and so on). The MIME content type for data being submitted to a CGI script is stored in CONTENT_TYPE. For example, if data is submitted to a CGI script using the GET method, CONTENT_TYPE contains the value application/x-www-form-urlencoded. This is because responses to the form are encoded according to URL specifications.
The CGI specification revision number is stored in the GATEWAY_INTERFACE environment variable. The format of this variable is CGI/revision. By examining this variable, a CGI script can determine what version of CGI the Web server is using.
Web clients can handle different MIME types. These MIME types are described in the HTTP_ACCEPT variable. MIME types accepted by the Web client calling the CGI script appear as a list separated by commas. This list takes the format type/subtype, type/subtype. For example, if the Web client supports the two image formats GIF and JPEG, the HTTP_ACCEPT list contains the items image/gif, image/jpeg.
By examining this value, the Web browser being used by the client can be determined. For example, if Netscape 2.0 beta 4 is being used by the client, the HTTP_USER_AGENT variable contains the value Mozilla/2.0b4 (WinNT; I). The general format of this variable is software/version library/version.
The PATH_INFO variable is usually used to pass options to a CGI program. These options follow the script's URL. Clients can access CGI scripts with additional information after the URL of the CGI script. PATH_INFO always contains the string that was used to call the CGI script after the name of the CGI script. For example, PATH_INFO has the value /These/Are/The/Arguments if the CGI script FunWithNT.EXE is called with the following URL:
http://your_server.your_domain/cgi-bin/FunWithNT.EXE/These/Are/The/Arguments
In the event the CGI script needs to know its absolute pathname, it can obtain this information from PATH_TRANSLATED. For example, if the CGI script being invoked is HelloNTWorld.EXE, all CGI scripts are stored in H:\www\http\ns-home\root\cgi-bin, and the CGI script is accessed with the URL http://your_server.your_domain/root/cgi-bin/HelloNTWorld.EXE, PATH_TRANSLATED contains the value H:\www\http\ns-home\root\cgi-bin\HelloNTWorld.EXE. If the CGI program needs to save or access any temporary files in its home directory, it can use PATH_TRANSLATED to determine its absolute location by examining this CGI variable.
You might have noticed that when you submit certain forms, a string of characters appears after a question mark, followed by the URL name of the script being called. This string of characters is referred to as the query string. When a CGI script is called with the GET method, QUERY_STRING typically contains variables and their values as entered by the person who filled in the form. QUERY_STRING is sometimes used by search engines to examine the input when a form is submitted for a keyword search. For example, if a CGI application is executed using the URL http://www.server.com/cgi-bin/application.exe?WindowsNT=Fun, QUERY_STRING contains the string WindowsNT=Fun.
The IP address of the client that called the CGI program is stored in the REMOTE_ADDR environment variable. Due to security reasons, the value of this variable should never be used for user authentication purposes. It's not very hard to trick your Web server into believing a client is connecting from a different IP address.
If the Web server performs a DNS lookup of the client's IP address and finds the alias of that address, the REMOTE_HOST variable contains that alias name. Some Web servers allow DNS lookups to be turned on or off. If you plan to use this variable to find the IP address alias of clients, be sure the DNS lookup option is turned on. The Web server can find the IP address alias of most, but not all, clients. If the Web server cannot find the IP address alias of a client, the REMOTE_HOST variable is not assigned the client's DNS alias value; it just contains the client's IP address. This value should never be used for user-authentication purposes.
If the Web server supports RFC 931 identification, REMOTE_IDENT contains the user name retrieved from the server. Unfortunately, this value cannot be trusted when transmitting sensitive data. Typically, a Web server obtains this value by contacting the client that initiated the HTTP request and by speaking with the client's authentication server. Visit http://www.pmg.lcs.mit.edu/cgi-bin/rfc/view?number=931 for additional information about RFC 931 and the Authentication Server Protocol.
Some Web servers support user authentication. If a user is authenticated, the CGI script can determine the user name of the person browsing the Web site by looking at the value of the REMOTE_USER environment variable. This CGI variable is available only if the user has been authenticated via an authentication mechanism.
A client can call a CGI script in a number of ways. The method used by the client to call the CGI script is in the REQUEST_METHOD variable. This variable can have a value like HEAD, POST, GET, or PUT. CGI scripts use the value of this variable to determine where to obtain data passed to the CGI script.
All files on a Web server are usually referenced relative to their document root directory. SCRIPT_NAME contains the virtual pathname of the script called relative to the document root directory. For example, if the document root directory is c:\www\http\ns-home\root, all CGI scripts are stored in c:\www\http\ns-home\root\cgi-bin\, and if the CGI script HelloNTWorld.EXE is called, the SCRIPT_NAME variable con-tains the value \cgi-bin\HelloWorld.EXE. The advantage of this variable is that it allows the CGI script to refer to itself. This is handy if somewhere in the output, the script's URL needs to be made into a hypertext link.
The domain name of the Web server that invoked the CGI script is stored in SERVER_NAME. This domain name can either be an IP address or DNS alias.
Typically, Web servers listen to HTTP requests on port 80. However, a Web server can listen to any port that's not in use by another application. A CGI program can determine at what port the Web server is handling HTTP requests by looking at the value of the SERVER_PORT environment variable. When you display self-referencing hypertext links at runtime by examining the contents of SERVER_NAME, be sure to append the port number of the Web server (typically port 80) by concatenating it with the value of SERVER_PORT.
Web servers speak the Hypertext Transport Protocol (HTTP) language. The version of HTTP that the Web server is using can be determined by examining the SERVER_PROTOCOL environment variable. This variable contains the name and revision data of the protocol being used. This information is in the format protocol/revision. For example, if the server speaks HTTP 1.0, this variable has the value HTTP/1.0.
The name of the Web server that invoked the CGI script is stored in the SERVER_SOFTWARE environment variable. This variable is in the format name/version. If a CGI script is designed to use special capabilities of a Web server, the CGI script can determine what Web server is being used by examining this variable before those special capabilities are used.
This introduction to PERL shows you how CGI PERL scripts can be set up on Windows NT Web servers. Numerous CGI PERL scripts can be found at Internet CGI PERL script archives. By using these scripts and customizing them to suit your needs, you can easily improve a Web site. A comprehensive tutorial of PERL is beyond the scope of this book; therefore, only the basics of writing CGI PERL scripts are discussed.
PERL stands for Practical Extraction and Report Language. With the growth of the WWW, PERL is increasingly being used to write CGI programs. Most of the best features of C, SED, AWK, and sh are incorporated into PERL; therefore, you can develop PERL scripts quickly because you don't have to reinvent the wheel for fundamental tasks like string manipulation. PERL's expression syntax corresponds quite closely to the expression syntax for C programs, which makes PERL easy to learn for those who are already familiar with C. One of the best things about PERL is its portability. PERL is an interpreted language available for several hardware platforms, including PCs, Macs, and different types of UNIX systems. Unlike most languages and utilities, PERL does not impose limits on data size. As long as you have enough system resources, PERL happily reads the contents of a multi-megabyte file into a string. Thanks to the optimizing algorithms built into PERL, scripts written in PERL are robust and fast.
Before you continue with the discussion on CGI PERL scripts, get PERL for Windows NT and install it on your Web server. PERL for Windows NT is provided free of charge at http://info.hip.com/ntperl/ on the Internet. After you get PERL for NT, create a directory, copy the PERL distribution file to this directory, then uncompress the distribution file. When you do so, choose the option to use stored directory names in the archive. If you don't, all files are extracted to the PERL directory you created, and you'll find yourself in a mess! After the archive is uncompressed, run install.bat to install PERL on your server.
Warning: Do not copy the file PERL.EXE to a CGI directory of your Web server. This enables a user with malicious intentions to execute Windows NT commands by hacking PERL! Instead of copying PERL.EXE to a CGI directory, create a file-extension mapping (sample files that end with .pl are mapped to C:\PERL\bin\PERL.EXE). The procedure for creating a file-extension mapping is different from one Web server to the next. Consult your Web-server documentation to learn how to create a file-extension mapping.
URL: When decompressing the .zip file, be sure to use a 32-bit unzipping program that supports long filenames. Otherwise, the distribution files may not be properly installed. WinZip is a fine file-decompressing program that supports long filenames and a variety of file-compression formats. You can get WinZip from this address:http://www.winzip.com/WinZip/download.html
After you install PERL, reboot your server for the installation directory paths to become effective.
Before you create CGI applications, check your Web server's settings and determine the name of its CGI directory. The remainder of this chapter assumes that this directory is CGI-BIN. When you are comfortable with CGI and are using CGI PERL scripts, you might want to visit the following URLs for more information about PERL and for sample CGI PERL scripts.
URL:Yahoo!'s Web page on World Wide Web programming with PERL scripts is located athttp://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Programming/Perl_Scripts/
http://www.yahoo.com/Computers_and_Internet/Languages/Perl/
To keep up-to-date with the latest news on PERL for Windows NT, join the following
mailing lists:
PERL-Win32--PERL discussion list; to subscribe, send an e-mail to majordomo@mail.hip.com
and include subscribe PERL-Win32 in the message. PERL-Win32_announce--PERL announcements;
to subscribe, send an e-mail to majordomo@mail.hip.com
and include subscribe PERL-Win32_announce in the message.
The PERL discussion list is a relatively high-volume mailing list. However, this
list is read by many Windows NT PERL programmers and answers any questions you might
have when starting out with PERL.
Chances are you have at least heard of C and possibly know how to program in C. Therefore, an introduction to the C programming language isn't necessary in this book. For more information, please refer to one of the many fine books that have been written about programming in C, such as The C Programming Language by Kernighan and Ritchie.
C is a general-purpose language that imposes very few restrictions on the programmer. It is also a portable language that can be moved from one computer to another, as long as only standard POSIX/ANSI C function calls are used. There are many CGI programs written in C on the Internet that can be used to enhance the capabilities of your Web site. For more information on C CGI programs, check out http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Programming/. By using Windows API calls from C programs, you can further exploit the capabilities of C and Windows NT. Although a command-line C compiler for Windows NT can be obtained from ftp://ftp.cygnus.com/pub/sac/gnu-win32/, I recommend that you invest in a C compiler with a GUI development environment (such as Microsoft Visual C++ or Borland C++).
All CGI scripts have one thing in common: The first two lines are the same for all CGI programs that display text output. The first line is Content-type: text/html. This line of text is always followed by two blank lines. Typically, ASCII character 10 is used twice immediately after this line of text to create the blank lines.
For example, this is the first line of output for all CGI C programs with text output:
printf("Content-type: text/html%c%c",10,10) ;
This is the first line of output for all CGI PERL scripts with text output:
print "Content-type: text/html\n\n";
When you write your own programs, your Web server will sooner or later generate error messages when you call the CGI script. Although things might get somewhat frustrating for you, don't give up! Most likely, the error message you get will be due to a minor oversight on your part. If you still don't get anywhere by debugging your CGI script, it's time for you to start printing everything you can think of to standard output. Perhaps a variable you thought contained a value contains nothing but a NULL string, or maybe an environmental variable you thought would be available to your script isn't. It's also possible that you left out the most important thing of all: the first line, mentioned previously, of all CGI scripts!
Rather than try to debug CGI scripts by executing them on your Web server, execute them from the command prompt to find out what really happened. To do this, you must manually set several CGI environment variables to make the CGI program believe it's really being invoked by a Web server. Environmental variables can be defined by using the SET command; its syntax looks like this:
SET VARIABLE_NAME=VARIABLE_VALUE
For example, you can create a batch file with the variable declarations in Listing 10.1 to test CGI programs when running them from the command prompt. You might need to change the value of QUERY_STRING if your CGI script uses arguments.
SET SERVER_SOFTWARE=Netscape-Communications/1.12 SET SERVER_NAME=your.host.name SET GATEWAY_INTERFACE=CGI/1.1 SET SERVER_PROTOCOL=HTTP/1.0 SET SERVER_PORT=80 SET REQUEST_METHOD=GET SET SCRIPT_NAME =/cgi-bin/ScriptName.exe SET QUERY_STRING=ArgumentsToCGIScript SET REMOTE_HOST =000.000.000.000 SET REMOTE_ADDR =000.000.000.000 SET CONTENT_TYPE=application/x-www-form-urlencoded SET HTTP_ACCEPT=image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* SET HTTP_USER_AGENT=Mozilla/2.0b4 (WinNT; I)
It's customary for the first program written in a new language or programming interface to display the string Hello World! Although this is a very simple application of CGI, it will teach you the basics of CGI scripts as well as how CGI scripts are called by Web browsers. The Hello World! script is demonstrated in PERL as well as C to make you more familiar with both languages.
This Hello World! CGI script simply displays the current day, time, arguments passed in, and the browser being used by the client to access the CGI script. And of course, the string Hello World! is also displayed!
The script that displays Hello World! and the additional information is very simple to write in PERL. The code for this PERL script is given in Listing 10.2. The output of the PERL script appears in Figure 10.6.
# Sanjaya Hettihewa, http://www.NetInnovation.com/ # "Hello World" CGI Script in PERL # Display content type being outputted by CGI script print "Content-type: text/html\n\n"; # Label title of contents being outputted print "<TITLE>Perl CGI Script Demonstration</TITLE>\n"; # Display text print "<H1>Hello World!</H1>\n"; print "<H3>Welcome to the fun filled world of<BR>\n"; print "Windows NT CGI programming with Perl!</H3><BR><BR>\n"; print "The Web browser you are using is:"; # Display value of the environmental variable HTTP_USER_AGENT print $ENV{"HTTP_USER_AGENT"} , "<BR>\n" ; print "Arguments passed in: "; # Display value of the environmental variable QUERY_STRING print $ENV{"QUERY_STRING"} , "<BR>\n" ; # Obtain date and time from the system ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime(time); # display time print "\nThe current time is: "; print $hour, ":", $min, ":", $sec , "<BR>\n"; # display date print "\nThe current date is: "; print $mon + 1 , "/", $mday , "/", $year, "<BR>\n";
Pay particular attention to how the PERL CGI script is invoked. In this example, the URL used to invoke the CGI script is
http://wonderland.dial.umd.edu/cgi-bin/perl.exe?PERLScripts/HelloWorld/HelloWorld.pl+Argument
Figure 10.6. Output of the Hello World!
CGI PERL script.
When calling a PERL script on a Windows NT Web server, the general syntax of the
URL is
http://A/B?C+D
where A is the host name of the Web server; in this example, it is wonderland.dial.umd.edu. B is the relative path to PERL.EXE; in this example, it is cgi-bin/PERL.EXE. C is the location of the PERL script. This path is relative to the location of PERL.EXE. D contains any arguments passed into the PERL script. These arguments can be obtained by examining the contents of the CGI environment variable QUERY_STRING.
When PERL scripts are called with arguments, URLs can become quite long. Avoid this by creating aliases for PERL scripts on your Web server. (Consult your Web server's documentation for more information on creating aliases for URLs.) For example, if an alias called Hello was created for
http://wonderland.dial.umd.edu/cgi-bin/perl.exe?PERLScripts/HelloWorld/HelloWorld.pl
the URL to call the preceding PERL CGI script is reduced to
http://wonderland.dial.umd.edu/Hello+Argument
Whenever you have complex URLs for CGI scripts, create an alias for the CGI script. By hiding gory details such as long and complicated URL paths, your Web site will actually look friendlier to those browsing it. It will also save you time when you refer to said CGI scripts from your Web page because you will have to do less typing. If you're still not convinced, think about how much easier it is to remember
http://wonderland.dial.umd.edu/Hello+Argument
as opposed to
http://wonderland.dial.umd.edu/cgi-bin/perl.exe?PERLScripts/HelloWorld/HelloWorld.pl+Argument
The following code (see Listing 10.3) lists the C program that displays the same information as the preceding PERL example. The output of the C script appears in Figure 10.7.
/* Sanjaya Hettihewa, http://www.NetInnovation.com/ * "Hello World" CGI Script in C */ /* Libraries containing special functions used in program */ #include <stdio.h> #include <stdlib.h> #include <time.h> main ( ) { /* Obtain current time */ time_t currentTime ; struct tm *timeObject ; char stringTime[128] ; currentTime = time ((time_t *) NULL ) ; timeObject = localtime (¤tTime) ; /* Display content type being outputted by CGI script */ printf ("Content-type: text/html\n\n"); /* Displaying simple text output */ printf ("<TITLE>C CGI Script Demonstration</TITLE>\n"); printf ("<H1>Hello World!</H1>\n"); printf ("<H3>Welcome to the fun filled world of<BR>\n"); printf ("Windows NT CGI programming with C!</H3><BR><BR>\n"); /* Display value of the environmental variable HTTP_USER_AGENT */ printf ("The Web browser you are using is: "); if ( getenv ( "HTTP_USER_AGENT" ) != NULL ) printf ( "%s%s" , getenv ( "HTTP_USER_AGENT" ) ,"<BR>\n") ; /* Display value of the environmental variable QUERY_STRING */ printf ("Arguments passed in: "); if ( getenv ( "QUERY_STRING" ) != NULL ) printf ( "%s%s", getenv ( "QUERY_STRING" ) ,"<BR>\n") ; /* Display date and time using strftime() to format the date */ strftime ( stringTime, 128, "%H:%M:%S", timeObject ) ; printf ("\nThe current time is: %s<BR>\n", stringTime ); strftime ( stringTime, 128, "%m/%d/%y", timeObject ) ; printf ("\nThe current date is: %s\n", stringTime ); return ( 0 ) ; }
Figure 10.7. Output of the Hello World!
CGI C script.
CGI C programs are accessed differently than PERL CGI scripts. C programs can be
directly executed by the Web server. However, PERL scripts must be interpreted using
the PERL interpreter. After compiling the C program into an executable program, it
should be placed in the CGI directory of your Web server or in a dir- ectory that's
a child of the CGI directory. In this example, the executable program is placed in
the cgi-bin directory, and the URL used to invoke the CGI script is
http://wonderland.dial.umd.edu/cgi-bin/hello.exe?Argument
When a CGI program on a Windows NT Web server is called, the general syntax of the URL is
http://A/B?C
A is the host name of the Web server; in this example, it is wonderland.dial.umd.edu. B is the relative path to the executable program from the Web server's document root directory; in this example, it is cgi-bin/hello.exe. C contains any arguments passed to the C program. You can obtain these arguments from examining the contents of the CGI environment variable QUERY_STRING.
A thorough introduction to programming languages and how they can be used for CGI programming is beyond the scope of this book. Most CGI applications developed in this chapter use the C programming language. To give you a feel for how different programming languages can be used to write CGI applications, however, you'll see how environment variables can be accessed by using PERL and C.
The following C program (see Listing 10.4) displays all CGI variables set by the Web server. Note that all environment variables may not be defined, depending on how the script is called. The CGI program displays all CGI variables defined by the Web server before invoking the CGI script.
/* C Program to display CGI environment variable values defined by the Web server before the CGI program is invoked */ #include <stdio.h> #include <stdlib.h> #define NUM_ENVIRONMENT_VARIABLES 19 main ( ) { /* Define the data structure that stores all the CGI variable names */ char* environmentVariables[] = { "SERVER_SOFTWARE", "SERVER_NAME", "GATEWAY_INTERFACE", "SERVER_PROTOCOL", "SERVER_PORT", "REQUEST_METHOD", "PATH_INFO", "PATH_TRANSLATED", "SCRIPT_NAME", "QUERY_STRING", "REMOTE_HOST", "REMOTE_ADDR", "AUTH_TYPE", "REMOTE_USER", "REMOTE_IDENT", "CONTENT_TYPE", "CONTENT_LENGTH", "HTTP_ACCEPT", "HTTP_USER_AGENT" } ; int count ; printf("Content-type: text/html%c%c",10,10) ; printf("%s%s" , "<PRE>\n", "<TITLE>CGI Environmental Variables Demonstration</TITLE>\n") ; /* Loop through all CGI variables that were defined earlier */ for (count = 0; count < NUM_ENVIRONMENT_VARIABLES; count++) /* Check if a certain CGI variable has been defined by the Web server and print its value if the CGI variable has been defined */ if ( getenv ( environmentVariables[count] ) != NULL ) printf ( "%17s = %s\n" , environmentVariables[count] , getenv ( environmentVariables[count] ) ) ; printf("</PRE>\n") ; return ( 0 ) ; }
The output of this CGI script is shown in Figure 10.8. As you can see, the CGI
script's URL is followed by additional arguments. Notice how the Web server has passed
the arguments following the URL to the CGI script by using an environment variable.
Figure 10.8. Output of CGI program written
in C when called with an argument after the CGI program's URL.
Similarly, CGI scripts can be accessed very easily from a PERL script.The CGI PERL script that displays values of various CGI variables is given in Listing 10.5.
# Print the first line of all CGI scripts print "Content-type: text/html\n\n"; print "<TITLE>PERL CGI Variable Demonstration</TITLE>\n"; printf( "<PRE>\n" ); foreach $EnvVar ( SERVER_SOFTWARE, SERVER_NAME, GATEWAY_INTERFACE, SERVER_PROTOCOL, SERVER_PORT, REQUEST_METHOD, PATH_INFO, PATH_TRANSLATED, SCRIPT_NAME, QUERY_STRING, REMOTE_HOST, REMOTE_ADDR, AUTH_TYPE, REMOTE_USER, REMOTE_IDENT, CONTENT_TYPE, CONTENT_LENGTH, HTTP_ACCEPT, HTTP_USER_AGENT ) # Loop through all environment variables and display the values of all CGI variables that have been defined by the Web server. { if ( $ENV{"$EnvVar"} ) { printf( "%17s = %s\n", $EnvVar, $ENV{"$EnvVar"} ); } } printf( "</PRE>\n" ); exit( 0 );
The output of this PERL script appears in Figure 10.9. Note again how the PERL
script is being called. The URL of the PERL script consists of the URL of PERL.EXE
and the location of the PERL script with respect to the location of PERL.EXE. The
URL in Figure 10.9 is given only to demonstrate how an argument can be passed into
PERL.EXE when the PERL interpreter is directly called. You should never place PERL.EXE
in a CGI directory of your Web server because users can hack PERL.EXE and execute
Windows NT commands on your Web server. Create a CGI directory mapping for PERL as
described in your Web-server documentation.
Figure 10.9. Output of CGI script written
in PERL to display CGI variables.
With the expansion of the World Wide Web, more and more Web browsers are being invented. Although there are many Web browsers available for Windows NT, their capabilities differ greatly. If the appearance of your Web site is important to you, consider setting up a CGI script to provide a customized Web page, depending on the browser being used. Clearly, this is not practical for a large Web site. However, by setting up a very simple CGI script, you can find out which Web browser is being used by the user browsing your Web site. If the browser being used is Netscape Navigator or Microsoft's Internet Explorer, you can provide a richly formatted Web page with HTML enhancements; otherwise, you can provide a basic page with the same content.
Most Web pages use special HTML tags (such as Netscape enhancements to HTML) to make them look very attractive when viewed with a browser that supports those tags. However, these pages tend to look less attractive when viewed with browsers that do not support the enhancements. The percentage of people with Web browsers that do not support the enhancements can be as much as 30 percent in some cases. For these Web browsers, it's possible to set up a CGI script that displays content formatted using standard HTML. Such a CGI script can display customized content, as shown in Figure 10.10, based on the CGI variable HTTP_USER_AGENT.
The CGI script in Listing 10.6 is very simple. It first determines the Web server
being used by looking at the environment variable HTTP_USER_AGENT. Depending on the
value of this variable, a page with Netscape enhancements to HTML can be displayed
if the browser being used is Netscape. On the other hand, a page that contains only
standard HTML 2.0 can be displayed if the browser used by a Web surfer is not Netscape.
By modifying the script, you can add more customized pages for other browsers. Such
a script can be used for important pages like the main home page of your organization.
By utilizing CGI to provide dynamic content, you give a good impression to those
browsing the contents of your Web site. You can ensure that a user with an advanced
Web browser sees richly formatted Web pages. It is not feasible to support more than
two custom Web pages. Both Internet Explorer and Netscape Navigator interpret HTML
tags more or less the same way. You might want to create one page with Netscape and
Internet Explorer extensions and another that uses only standard HTML 2.0.
Figure 10.10. Using a CGI program to
provide customized content based on the Web browser being used.
Note:The following program is not optimized for speed of processing but for ease of reading. Because it was written to demonstrate how a CGI program can be built to provide customized content, it focuses on teaching CGI fundamentals and not on optimizing the code. You can make it more efficient by reading chunks of the file at a time rather than reading and outputting the file character by character.
/* © 1996 Sanjaya Hettihewa (http://www.NetInnovation.com/) All Rights Reserved. * January 1, 1996. Updated November 19, 1996 * Program to output a customized Web page based on Web browser being used. */ /* Special function libraries being used by this program */ #include <stdio.h> #include <stdlib.h> #include <string.h> /* Please note the use of double quotes. This is because a single quote is used to quote the next character */ /* If you provide content specially formatted for a different browser, please change the following */ #define SPECIAL_BROWSER_SUB_STRING "Mozilla" /* Please change the following to the full path name of the HTML file that's specially formatted */ #define SPECIAL_BROWSER_PAGE "H:\\www\\https\\ns-home\\root\\documents\\WSDGNT\\ ¬special.htm" /* Please change the following to the full path name of the HTML file that's formatted using standard HTML */ #define OTHER_BROWSER_PAGE "H:\\www\\https\\ns-home\\root\\documents\\WSDGNT\\ ¬regular.htm" /* Please change the following to the e-mail address of your Web site administrator */ #define WEBMASTER "mailto:Webmaster@wonderland.dial.umd.edu" static int DisplayPage ( char *pageName ) ; main ( ) { /* The "First Line" of all CGI scripts... */ printf("Content-type: text/html%c%c",10,10) ; /* Find out what Web browser is being used */ if ( getenv ( "HTTP_USER_AGENT" ) == NULL ) { printf("FATAL ERROR: HTTP_USER_AGENT CGI variable undefined!\n") ; return ( 0 ) ; } /* Display apropriate page based on browser being used by client */ if (strstr (getenv ("HTTP_USER_AGENT" ), SPECIAL_BROWSER_SUB_STRING)!=NULL) DisplayPage ( SPECIAL_BROWSER_PAGE ) ; else DisplayPage ( OTHER_BROWSER_PAGE ) ; return ( 0 ) ; } /* Contents of file passed into this function will be displayed to standard output. The Web server will transmit what's displayed to standard output by this CGI script to the client that called the CGI script */ int DisplayPage ( char *pageName ) { FILE *inFile ; char character ; /* Check to ensure a valid file name is given */ if ((inFile = fopen(pageName, "r")) == NULL) { printf ( "FATAL ERROR: Content file can't be opened! %s<BR>", pageName); printf ( "Please contact the <A HREF=%s>Webmaster.</A><BR>", WEBMASTER ); return ( 0 ) ; } /* Displaying contents of file to standard output Please note that this can be done more efficiently by reading chunks of the file at a time */ fscanf ( inFile , "%c" , &character ) ; while ( !feof(inFile) ) { printf ( "%c" , character ) ; fscanf ( inFile , "%c" , &character ) ; } fclose(inFile); return ( 1 ) ; }
Because the purpose of this program is to provide customized content based on the browser being used to browse your site, you must create two separate Web pages. The first Web page is sent to any non-Netscape browser, assuming it does not parse various HTML-enhanced tags as Netscape does. This Web page can be very simple. For the purpose of this demonstration, assume that you need to display a number of options inside a table. Because browsers that do not support the <TABLE> tag might interpret this tag differently, you have no control over how your Web site looks when viewed with a different browser. To remedy this situation, use the preceding CGI program. For Web browsers that do not support enhanced HTML tags, you can create a non-table version of the same page using only standard HTML. By doing this, the appearance of your Web site can be controlled no matter what browser is being used to access your Web pages.
The following is the standard HTML Web page designed for non-Netscape browsers. The standard HTML Web page (see Listing 10.7) is referred to in the CGI program with the following statement:
#define OTHER_BROWSER_PAGE "H:\\www\\https\\ns-home\\root\\documents\\WSDGNT\\regular.htm"
The location of the preceding file is defined in the C program so that its contents can be displayed for non-Netscape browsers. In the C program, the location of the preceding file is defined in OTHER_BROWSER_PAGE.
<TITLE>Standard HTML page</TITLE> <BODY> Welcome to the standard HTML page for technically challenged Web browsers. <P> Option One<BR> Option Two<BR> Option Three<BR> </BODY>
The Web page in Listing 10.8 is the Netscape-enhanced HTML Web page that is designed for those browsing your Web site with Netscape. This Web page displays the same three options that are displayed by the standard HTML page. However, the options are displayed inside a table with additional Netscape enhancements. The Netscape-enhanced Web page (see Listing 10.8) is referred to in the CGI program with the following statement:
#define SPECIAL_BROWSER_PAGE "H:\\www\\https\\ns-home\\root\\documents\\WSDGNT\\special.htm"
In the C program, the full pathname of the preceding file is stored in SPECIAL_BROWSER_PAGE. The contents of this file are displayed by the CGI program whenever Netscape Navigator is used. You will need to change this variable depending on where you store the Netscape-enhanced Web page. The contents of the Netscape-enhanced Web page are given in Listing 10.8.
<TITLE>Netscape Enhanced page</TITLE> <BODY> <CENTER> <TABLE BORDER=15 CELLPADDING=10 CELLSPACING=10 > <TR> <TD > Welcome to the <FONT SIZE=4> Ne</FONT><FONT SIZE=5>ts</FONT><FONT SIZE=6>ca</FONT><FONT SIZE=7>pe </FONT> <FONT SIZE=6> En</FONT><FONT SIZE=5>ha</FONT><FONT SIZE=4>nc</FONT><FONT SIZE=3>ed </FONT> Web page! </TD> <TD >Option One<BR></TD > <TD >Option Two<BR></TD > <TD >Option Three<BR></TD > </TR> </TABLE> </CENTER> </BODY>
After you compile this program and place it in your Web server's CGI directory,
depending on what browser is used to call the CGI script, the appropriate page is
displayed. When compiling the C program, please be sure to change SPECIAL_BROWSER_PAGE,
OTHER_BROWSER_PAGE, and WEBMASTER. The output of this CGI program appears
in Figures 10.11 and 10.12. For the purpose of this example, Netscape and Mosaic
Web browsers were used. Note how the enhanced HTML page is displayed when the script
is accessed with Netscape, and the standard HTML page is displayed when the script
is accessed with Mosaic.
Figure 10.11. Output of a CGI program
when invoked using Netscape.
Figure 10.12. Output of a CGI program
when invoked with a Web browser that does not support tables.
One of the best things about the World Wide Web is that you can use it to distribute information to millions of people. CGI allows you to interact with this large audience. This chapter introduced you to using CGI to enhance the capabilities of your Web site as well as how to write CGI scripts in PERL and C. Various aspects of setting up CGI scripts, such as security, are also covered in this chapter so that the CGI scripts you develop and set up will not be a threat to the security of your Web server.
To become more familiar with the topics covered in this chapter, spend some time with either C or PERL. After you write CGI programs and experiment with the effects of making changes to them, you will discover how CGI scripts work as well as gain more experience in debugging and developing them. Afterward, you will be able to create CGI applets to perform many specialized tasks. By utilizing CGI and unleashing its potential to make the contents of your Web site easier to navigate, you will have an outstanding Web site with many repeat visitors.
© Copyright, Macmillan Computer Publishing. All rights reserved.