Chapter 12
Perl/CGI Libraries and Databases
CONTENTS
In this chapter, the concept of Perl libraries is explored and
explained, with reference given to several online resources available
to you.
When designing any script, but especially when designing for CGI,
you should follow some traditional advice when it comes to programming.
Never forget that you are writing these scripts on a computer,
and can take full advantage of what that means.
- Always read the documentation. At the very least read the
READ MEs. When scripting for different applications, you are always
dealing with slight variations on a theme. You need to know what
little details might trip you and cause you to stumble.
- Test your script intensely. When you test a script, you are
trying to make it crash if you can, so in the end it will be resilient
to users' usage, however they may use it. Run other applications
with it, input all manner of data, and have various system loads
for starters.
- Look around on the Internet for software tools that have been
developed by others. You might even find a script that solves
your problem, or gets you pretty close to solving it. These scripts
have already gone through some testing (if they hadn't they wouldn't
be available for download) so you don't have to reinvent the wheel.
Some of the places you can look are the cgi-lib.pl library located
at
http://www.bio.cam.ac.uk/web/form.hmtl
or the CGI.pm library located at
http://www-genome.wi.mit.edu/WWW/tools/scripting/CGIperl
- Get in touch with e-mail list and Usenet newsgroups that deal
with your problem. Don't just look for the Perl NT group, but
for others that deal with the Web browser you decide to use, the
computer hardware you use, and so forth. There are also Frequently
Asked Questions, or FAQs, listings that seem to cover just about
every configuration going these days, so look for these, too.
Before going into how Perl libraries work, and how you can add
them into your CGI scripts, let's discuss some of the stages that
data must go through to be sent between client and server on the
Internet.
To pass data over the Internet using the CGI, it must be URL-encoded.
The data is first transferred as name/value pairs. Then these
pairs are divided by the "&" symbol. Each pair is
identified with each other by the "=" symbol, with the
default value entered as the value, unless the default value is
undefined, and then the value will be empty. The name/value pair
will still be sent. When spaces occur in the encoding, they are
represented by the "+" symbol. Any reserved characters
necessary must be encoded using their hexadecimal equivalent;
a "%" symbol and a two-digit (hexadecimal) number. Those
characters with special meaning must be encoded before being sent
to the client.
Understanding these elements of URL encoding is a key to successful
client/server relations.
There are several standard Perl library routines available online.
The following scripts are examples of the kinds of useful scripts
found in these libraries. These scripts come in the form of subroutines
that can be placed at the end or beginning of your full script,
and then invoked with a subroutine call where necessary in your
script.
These scripts range from the simple MIME header script to a larger
date creation script. Each of these scripts can be inserted "as
is" in your scripts, or adapted to fit your script's needs.
Some of these scripts are quite short, and it might seem less
trouble to create a standard one- or two-line subroutine. When
deciding whether or not to use the "three or greater"
rule consider this: If you are going to use a function three or
more times in a script, that function should have its own subroutine.
This script has been used in some of the other scripts in this
book to designate the MIME header of the response going back to
the client. The only specification that must be made is the value
of $title, the title of the HTML document being returned.
# a subroutine that returns an HTML MIME header
sub html_header {
local($title) = @_;
print "Content-type: text/html\n\n";
print "<HTML><HEAD>\n";
print "<TITLE>$title</TITLE>\n";
print "</HEAD>\n<BODY>\n";
}
To determine what kind of request method, GET or POST, is being
used to call your CGI scripts, you can use this short subroutine:
# Find the request method
sub MethGet {
return ($ENV{'REQUEST_METHOD'} eq "GET");
}
This script returns a true value if the request method being used
is GET, meaning that a false value would be POST.
To put a date stamp on a page that is in a different format than
the available environmental variable, this subroutine can be adapted
to whichever date format you wish. This version contains a copy
line that states the creator of the page, and offers a link to
the home page of that creator.
# a date stamp signature
sub date_stamp{
local($sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst) = localtime;
local(@days) = ('Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday');
local(@months) = ('January','February','March','April','May','June','July',
'August','September','October','November','December');
print"<P>\nThis page created by <A HREF=\"http://www.my_server.com/index.html\">";
print "<B>my_domain.com</B></A>";
print " on ";
printf("%Ø2d:%Ø2d:%Ø2d on %s %d, %d", $hour, $min, $sec, $months[$mon], $mday, 1900+$year);
print "</BODY></HTML>\n";
}
NOTE |
When the year 2000 rolls around, the 1900 value needs to be changed
|
This subroutine tacks on a return link to your default home page
at the bottom of an HTML document generated by your server through
the CGI. It also includes an image.
# return to the default home page
sub home {
local($gif,$text) = @_;
# if nothing supplied, use default gif and text
if ($#_ < Ø) {
$gif = "default_return.gif";
$text = "Return to our home page";
}
print "<HR>";
print "<A HREF=\"http://my_server.com/index.html\">";
print "<IMG src = \"http://my_server.com/$gif\">";
print "$text </A>";
print "<HR>\n";
}
To create an HTML display of the variables sent to your Perl scripts
through the CGI, add this subroutine. This is handy if you want
to return the data that a user has input into your form for a
verification check.
# display the variables
sub printvar {
local(%in) = @_;
local($old, $out, $outout);
$old = $*;
$* = 1;
$output .= "<DL COMPACT>";
foreach $key (sort keys(%in)) {
($out = $_) =~ s/\n/<BR>/g;
$output .= "<DT><B>$key</B><DD><I>$out</I><BR>";
}
}
$output .= "</DL>";
$* = $old;
return $output;
}
This subroutine will create a list of the variables in the HTML
document.
This is similar to the previous list, except that it returns the
list of variables as one line-not as a list.
# display the variables in compact form
sub printvarcompact {
local(%in) = @_;
local($old, $out, $outout);
$old = $*;
$* = 1;
foreach $key (sort keys(%in)) {
($out = $_) =~ s/\n/<BR>/g;
$output .= "<B>$key</B> is <I>$out</I><BR>";
}
}
$output .= "</DL>";
$* = $old;
return $output;
}
To store the data sent to your server from an HTML form using
either the GET or POST method, this subroutine will store it in
the associative array @in.
# storing user request data
sub readparse {
local(*in) = @_ if @_;
local ($I, $loc, $key, $val);
# read in text
if ($ENV{REQUEST_METHOD'} eq "GET") {
$in = $ENV{'QUERY_STRING'};
} elsif ($ENV{'REQUEST_METHOD'} eq "POST"); {
read(STDIN,$in,$ENV{'CONTENT_LENGTH'});
}
@in = split(/&/,$in);
foreach $in(Ø..$#in); {
# convert pluses into spaces
$in[$i] =~ s/\+/ /g;
# split into key and values on the first '='
($key,$val) = split(/=/.$in[$i],2);
# convert %XX from hex to alphanumeric
$key =~ s/%(..)/pack("c",hex($1))/ge;
$val =~ s/%(..)/pack("c",hex($1))/ge;
# associate the key with value
$in{$key} .="\0" if (defined($in{$key}));
# 0 is used as the separator of multiple values
$in{$key} .= $val;
}
return 1;
}
The database model that most of you will be familiar with is the
relational database. In this model the data is stored in a table
format, with the columns used as fields and the rows used as records.
The row/records relate to the column/fields, making a relational
database.
To put data into a two-dimensional database like this, Perl has
the split operator, which can be used on flat file records to
prepare them for storage in a relational database. Perl also can
search several databases by linking, or joining, common fields
in each database.
Before you jump into designing Perl search scripts, it is important
to discuss the databases themselves. A well-organized database
speeds up searches, and is easily adaptable to your future needs.
Creating and maintaining a good database takes a great deal of
time, but this is time well-spent, so figure out how to work it
into your schedule. The time you spend now creating a clear, straightforward
database will repay you every time it is used.
The standard Perl library, which comes with every installation
of Perl for Windows NT, contains the Perl script, or routine,
look.pl. This script can be used to do binary searches on large
text files, such as long HTML documents, or relational databases.
A binary search does not go through each entry sequentially, looking
for a match, but divides the file in half, and half again until
a match is found. This speeds a search tenfold.
When the large text file in question is sorted with a key field,
then look.pl will begin a binary search. This example provides
an HTML form from which the user enters a last name for the search
to match. The database structure for the search is based on the
simple format shown in Table 12.1.
Table 12.1 Simple Personnel Database Format
Last Name Uppercase | User ID
| Full Name |
FAWKES | xfawkes | Xavier Fawkes
|
JOHNSON | ejohn | Ernie Johnson
|
OLIVIA | molivia | Magdalene Olivia
|
SMITH | lsmith | Leora Smith
|
WALTER | twalter | Tony Walter
|
This database file can be stored as name_file.dat, or by whatever
file extension your database application uses.
The search through this file, using the example scripts and look.pl,
produces the matching name, e-mail address, and the person's full
name. Watch out for the embedded space between the first and last
names in the third row. You must make sure that Perl splits the
entries between the rows, and not between these embedded spaces.
This is accounted for in the example script.
The example data file is also already sorted alphabetically by
last name. With this simple search there is no provision for Perl
to organize the various entries before making the search, although
this feature can be added easily. The problem with adding this
procedure is that it drastically increases the processing time
Perl needs. A properly organized and maintained database does
not need this procedure added.
The HTML form that asks for user input looks like that shown in
Figure 12.1. The script asks for the last name of the person for
whom the search is being conducted, as well as for a user selection
of an upper limit to the number of matches that will be returned.
Figure 12.1 : A simple name search form.
<HTML>
<HEAD>
<TITLE>Simple Name Search</TITLE>
</HEAD>
<BODY>
<H2>Name Search</H2>
<OL>
<LI>Please type in the last name of the person you are looking for. Three letters will do.<BR>
<LI>Please select the appropriate match limit.
</OL>
<P>
<FORM method="POST" action+"http://my_server.com/cgi-bin/name_file.pl">
<B>Last Name</B>
<INPUT name="last_name">
<P>
<B>Match Limit</B>
<SELECT name="match-limit">
<OPTION> No Limit
<OPTION> 1ØØØ Matches
<OPTION> 5ØØ Matches
<OPTION> 25Ø Matches
<OPTION selected> 1ØØ Matches
<OPTION> 5Ø Matches
<OPTION> 25 Matches
<OPTION> 1Ø Matches
</SELECT><P>
Submission Choices: <INPUT type="submit" value="Begin Search">
Reset Search: <INPUT type="reset" value="Reset Match">
</FORM>
</HTML>
This form places a request to name_file.pl, which is the Perl
script that will search the specified data file, name_file.dat.
#! /usr/bin/perl
# name_file.pl
require 'look.pl';
require 'cgi-lib.pl' # another standard Perl script
&html_header("Search Results"); # our subroutine
# from earlier in this chapter
&parse_request; # from cgi-lib.pl
$title = "Name Search Results";
$server = "\@my_server.com";
if ($query{'debug'} =~ /no debug/) {}
else {
&debug_info;
}
if ($query{'last_name'} eq "") {
print "<H2>Please enter the last name.</H2><P>";
print "<A HREF=\"http//my_server.com/namser.html\">";
print "Please enter the name again.</A>";
exit 1;
}
$path = "database/path"; # where your database
# files are keep
$names = "$path/name_file.dat";
&search_result();
print"<HR>";
$counter = Ø;
open(NAME_LIST, $name_list)|| die "Unable to open $name_list data file.";
&look(*NAME_LIST, $query{'last_name'},Ø,1);
# this is the subroutine from look.pl that
# performs the binary search. Without this
# subroutine this search would go through the
# entire file one entry at a time...very slow!
while (<NAME_LIST>) {
last unless /^$query{'last_name'}/i;
@line = split(/\s\s+/);
if ($counter > $query{'match-limit'}) {
$counter--;
print "<I>Match limit of $counter reached...search ending.</I>";
last;
}
print "<H2>Your Search Results</H2><P><HR><P>";
print "<PRE>";
printf(" %-2Øs %-15s %-3Øs", $line[Ø], $line[1]$server, $line[2]);
print "</PRE>";
}
close(NAME_LIST) || die "Unable to close $name_list data file.";
print "<HR>";
print "Your search found $counter match(s).<P>";
print "<A HREF=\"http://my_server.com/ namser.html\">";
print "Another search?</A>";
exit Ø;
sub search_result{ # the search result display
$fhdr="<B>Last Name</B>";
$chdr="<B>E-mail Address</B>";
$shdr="<B>Full Name</B>"
print "<PRE>";
printf(" %-25s %2Øs %3Øs ",$fhdr, $chdr, $shdr);
print "</PRE>;
}
sub html_header { # our html header
local($title) = @_;
print "Content-type: text/html\n\n";
print "<HTML><HEAD>\n";
print "<TITLE>$title</TITLE>\n";
print "</HEAD>\n<BODY>\n";
}
sub show_debug {
while (($key,$value) = each(%query)) {
print "The value of $key is $value <BR>";
}
}
exit Ø;
This produces a result like that shown in Figure 12.2.
Figure 12.2 : Simple search results.
This binary search is based on only one parameter given to the
CGI script by the user. How do you ask for more than one parameter?
That would be a binary search involving multiple user parameters.
To search a data file using more than one parameter from the user
is very similar to making a search based on only one. Using the
look.pl script again, this kind of search requires a larger HTML
form. This is an HTML document that asks the user for parameters
to search a database of Web links, and then return the matches
to the user.
<HTML>
<HEAD>
<TITLE>
Search Our Links
</TITLE>
</HEAD>
<BODY>
<H2>
Search Our Links!
</H2>
<P>
<HR>
<P>
We have a huge data base of links. Please enter the following information to find
links that will interest you.
<P>
<OL>
<LI>Link Subject - What is the Site about?
<LI>Date Limit - How old/new is the Site?
<LI>Number of Matches - How long a list of Sites?
</OL>
<P>
<FORM method ="POST" action="http://my_server.com/cgi-bin/data_search.pl">
<B>Link Subject </B><INPUT name="subject" size=2Ø maxlength=3Ø>
<P>
<B>Date Limit</B>
<SELECT name="date">
<OPTION>Last Week
<OPTION>Last Two Weeks
<OPTION>Last Month
<OPTION>Last Six Months
<OPTION>Last Year
</SELECT>
<P>
<B>Number of Matches</B>
<SELECT name="matches">
<OPTION>2Ø
<OPTION>1Ø
<OPTION>5
<OPTION>1
</SELECT>
<P>
Begin Search?: <INPUT type="submit" value="Begin Search">
Reset Search: <INPUT type="reset" value="Reset Match">
</FORM>
<P>
<HR>
</BODY>
</HTML>
In this form, the user can supply the subject of the links for
which he or she is searching, and the Perl script dbsearch.pl
will examine your database file of links and return them to the
user. The previous HTML document should appear to the user like
that shown in Figure 12.3.
Figure 12.3 : An HTML formfor a multiparameter binary
search.
The script that this form calls, dbsearch.pl, takes the user input
and places it into dbsearch.pl.
#! /usr/bin/perl
# dbsearch.pl
sub match {
require look.pl;
require cgi-lib.pl;
}
&html_header;
# read and split the name/value pairs
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/,$buffer);
foreach (@pairs) {
($key, $value);
$value = $deweb{$value};
$form{$key} = $value;
# Error message if no subject entered
if ($form{'subject'} eq "") {
print "<H2>You have to submit a subject name.</H2><P>";
print <A HREF=\"http://my_server.com/bisearch.htm\">";
print "Make another search.</A>";
}
# this section determines how far back to
# run the search
if ($form{'date'} eq "Last Week"){
$matchdate = &main'jtod($julnum-7);
} elsif ($form{'date'} eq "Last Two Weeks"){
$matchdate = &main'jtod($julnum - 14);
} elsif ($form{'date'} eq "Last Month"){
$matchdate = &main'jtod($julnum - 3Ø);
} elsif {
($form{'date'} eq "Last Six Months"){
$matchdate = &main'jtod($julnum - 18Ø);
} elsif {
($form{'date'} eq "Last Year"){
$matchdate = &main'jtod($julnum - 36Ø);
}
($nday, $nmon, $nyear) = split(/-/,$matchdate);
if ($nmon < 1Ø) {
$nmon = join('','Ø',$nmon);
}
if ($nday < 1Ø) {
$nday = join('','Ø',$nday);
}
$matchdate = join('',$nyear, $nmon, $nday);
$forms = "usr/bin/links.dat"; # the database
# searched
$logpath = "web/logs/"; # a log path
$logname = "dbsearch.log"; # a log to keep access
# data in
$logfile = ">>$logpath$logname";
&aux_var; # for max_hit counter
&column_names(); # match column field names
print "<HR>";
$counter = Ø; # set match counter to Ø
open(LINKS, $form) || dies "Unable to open the input file.";
&look(*LINKS, $form{'subject'},1,1); # the actual
# search
open(LOGFILE,$logfile) || die "Unable to open $logname";
print LOGFILE "$date | $ENV{REMOTE_HOST} | $ENV{REMOTE_ADDR} | $ENV{HTTP_USER_AGENT} |
$form{'subject'} | $userform \n";
# records each search in the log
close LOGFILE || die "Unable to close $logname\n";
while (<SUBJECT>) {
last unless /^$form{'subject'}.*/i;
@line = split(/\s\s+/);
if (($line[1] =~ /$userform/i || $userform eq "ALL") && ($line[3] >= $matchdate)) {
$subject = "<A HREF=ftp://my_server.com/$line[4]>$line[Ø]</A>";
@date = split(//,$line[3]);
$date = "$date[4]$date[5]-$date[6]$date[7]-$date[Ø]$date[1]$date[2]$date[3]";
$counter++;
}
# this stops the search once the maximum limit
# has been reached
if ($counter > $ulimit[1]) {
$counter--;
last;
}
print "<PRE>";
printf(" %s %-1Øs %-6s %3Øs","$date,$line[1],$line[2],$subject);
print "</PRE>";
}
close(SUBJECT) || die "Unable to close the file.";
print "<HR>";
$hnew = $counter;
print " Your link search found $hnew match(s).<P>";
print "<A HREF=\"http://my-server.com/link_search.html\">Search Again</A>";
exit 1;
}
sub column_names {
$fhdr="<B>Link Type</B>";
$chdr="<B>Link Name</B>";
$shdr="<B>Link URL</B>";
$dhdr="<B>Date Verified Active</B>";
print="<PRE>";
printf("%-1Øs %-1Øs %-1Øs %s",$dhdr,$fhdr,,,$chdr);
print="</PRE>";
}
sub j_number {
local ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;
$smon = $mon + 1;
$syear = $year + 19ØØ; # this has to be updated
# by the year 2ØØØ
$timedate = join('-',$mday,$smon,$syear);
&main'dtoj($timedate);
}
sub html_header {
local($title) = @_;
print "Content-type: text/html\n\n";
print "<HTML><HEAD>\n";
print "<TITLE>$title</TITLE>\n";
print "</HEAD>\n<BODY>\n";
}
sub aux_var {
@ulimit = split(/=/,$pairs[4]);
$ulimit[1] =~ y/=/ /;
$ulimit[1] =~ y/a-z/A-Z/;
$ulimit[1] =~ y/=/ /;
$ulimit[1] =~ y/a-z/A-Z/;
if ($ulimit[1] =~ /^no/i) {
$ulimit[1] = 99999;
}
}
eval '&match';
exit Ø;
This script produces an HTML document with the output shown in
Figure 12.4.
Figure 12.4 : The results of a multiparameter binary
search.
More and more the Web is being used as a way to provide access
to large, professional-size databases. These are often called
relational database management systems, or RDBMS. The "serious"
database systems, such as Sybase or Oracle, become even more powerful
when you can provide Internet access to them via the Web. You
might want to open this access to the general public, or restrict
it to members only. Regardless of your choice, Perl has proven
itself as an ideal language for dealing with RDBMS.
Binary searches are very good for dealing with flat files like
those shown in the previous examples, but they are limited when
dealing with the larger databases. Luckily, these larger databases
come with their own search engines. Perl scripts can be written
to create Web interfaces with these search engines but, to do
this successfully, you must understand how the database behaves
on the command line. Each database package has its own command
line behavior, so be sure to check the documentation carefully.
To deal with these large kinds of databases effectively you should
learn about Structured Query Language, or SQL, which is a standard
in computer languages specifically for dealing with RDBMS. Perl
is designed to integrate with SQL. One important aspect of this
procedure that may be overlooked when dealing with databases is
the database itself. Setting up a solid and well-organized database
will save you and your users valuable time. Perl can be used
to establish a gateway between your Web site and your relational
database.
The details of SQL are too involved to be dealt with here, but
for more information on SQL try these two sites:
http://www.jcc.com/sql_stnd.html
http://www.inquiry.com/techtips/thesqlpro/
Using SQL often involves understanding C, a language used with
most industrial-sized relational databases.
The uses of large, relational databases are many, and extending
their utility to your Web pages has an obvious appeal. Perl can
be used to create a search gateway between your Web site and database.
It can also be used to create a gateway for an SQL-standard database.
Databases also can place large demands on your resources, inhibiting
access to your Web site itself. If it is possible to use a separate
computer as your database server, that will solve the drain on
your resources.
To use your database to its fullest extent, make sure that you
answer the following questions:
- What are the environmental variables used by your RDBMS?
- How does your RDBMS use SQL on the command line?
- What is your database server, and what are its peculiarities?
- Is there a limit to the number of users to your database stated
on your license?
Each database has its own answers, so don't be afraid to check
the accompanying documentation. If you have a database administrator,
become his or her friend.
Some final things to remember when dealing with Perl and databases
are as follows:
- Use binary searches on flat files for speedy searches.
- Understand the limit of a flat file search so you can switch
to a full database search engine.
- Both the printf and format operators work well to format output
from a database.
- Using subroutines properly can greatly reduce scripting time
and increase a script's efficiency.
- When using Perl with a database's search engine, always take
its command line performance to heart.
Adding access to database information makes any Web site more
valuable to the user, and is often worth the extra time and headache.
Careful planning and implementation will minimize these pitfalls,
and provide your Web site with a truly useful asset.