Chapter 14
Perl and Tracking
CONTENTS
There has been little mention so far in this book about the use
of logs and other methods of tracking. Once you have a complete
Web site, it is necessary to find out how users travel through
it. One of the ways to do this is to track usage with logs. To
do this accurately, you can place a Perl script at the top of
the Web page to be tracked. To demonstrate this use of Perl, in
this chapter a tracking element is added to the Goo Goo Records
Web site.
There are all kinds of logs, or lists of actions, that happen
inside a computer. Logs tend to be divided up based on their purpose,
such as a system log to record actions-called events-done by the
NT system, or an application log that records events caused by
applications. These logs can be used to keep track of anything
that happens that might be of interest to a Web Master or Network
Administrator. For example, every time the computer is asked to
start up an application, a note of that event is made in the application
log. This log, like most others, can be viewed using Event Viewer.
Within a Web site logs can be used to monitor who is visiting
your site, and even if they are trying to go into places they're
not supposed to.
One of the early problems with tracking and logging on the Web
were unrealistic and high hit counts for Web sites. These inflated
numbers were, and still are, caused by simplistic uses of counters
to record hits on a particular page. It is quite common for Web
pages to contain two or three hypertext links, an image link,
and a next page link. If the user accesses all of these links
then a hit count of four or five may occur, giving a skewed version
of site usage. This is only true if the links on the page are
used; their presence alone will not skew the hit count.
There are several solutions available to avoid this problem. One
is to add a short Perl script to the top of the larger Perl script
that delivers the Web pages that are being monitored for user
traffic. There are two ways to do this:
- Have a form call a CGI script, and that script will load the
page, and record the hit to that page.
- Have one of the links to the HTML documents call an URL, like
this one from the Goo Goo Records site-http://www.googoo.com/cgi-bin/page.pl?next.htm.
This second method will call a Perl script, page.pl, which will
read the query string information for the HTML document, in this
case, next.html. The script will then deliver that HTML document.
The second method is a little more flexible because you only need
one Perl script to deliver any page: the page to deliver changes
with the query string info. One drawback is that all of your links
will be to a Perl script, making the response time longer. Also,
you would have to do all of your logging from the Perl script
because the Web server log would only record that every user called
the script x number of times, without recording what the
destination was. This may be desirable, though, because using
this method allows you to make the Web site's log files as minimal
or as detailed as you like. This method is explored in detail
later in this chapter, as it is the same method used by the Goo
Goo Records' Web Master on their Web site. This is the script
that performs the logging task:
#!/usr/bin/perl
###################################################
#
# This is the Page delivery script.
#
# This script takes the query string information as the filename and
# delivers the file to the browser. A link to deliver the page new.html would
# look like this:
#
# <A HREF="http://www.googoo.com/cgi-bin/page.pl?new.html>new</a>
#
# Path information is also valid, and necessary to get lower in the directory
# structure:
#
# <A HREF="http://www.googoo.com/cgi-bin/page.pl?/newstuff/new/new.html>new</a>
#
# This will allow more flexible logging of any page that is delivered with this
# script. With a little work, you can even get this script to process server
# side includes, counter, and all that jazz.
# The trouble here is that the server logs will now only show the user hitting
# page.pl, no matter which page they request. This is fine if you are creating
# your own logs, but can be frustrating if you are not. This script generates
# a log similar to the one generated by the EWACS server.
#####################################################
if ($ENV{'REQUEST_METHOD'} EQ 'GET') {
$file=$ENV{'QUERY_STRING'};
$file=~s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;
print "Content-type: text/html\n\n";
$file="c:\googoo\$file";
if (-e $file) {
open(LOG,">>c:\\logs\\access");
$t=localtime;
print "$t $ENV{'SERVER_NAME'} $ENV{'REMOTE_HOST'}
$ENV{'REQUEST_METHOD'} $file
$ENV{'SERVER_PROTOCOL'}\n";
close(LOG);
open(HTML,"$file");
while ($line=<HTML>) {
print $line;
}
close(HTML);
}
else {
print <<EOF;
<HTML>
<HEAD>
<TITLE>Error! File not found</TITLE>
</HEAD>
<H1>Error! File not found</H1>
<HR><P>
The file you requested was not found. Please contact <address><A
HREF="mailto:webmaster@googoo.com">webmaster@googoo.com</a></address>
</HTML>
EOF
}
}
else {
print "<HTML>\n";
print "<title>Error - Script Error</title>\n";
print "<h1>Error: Script Error</h1>\n";
print "<P><hr><P>\n";
print "There was an error with the Server Script. Please\n";
print "contact GooGoo Records at <address><a
href=\"mailto:support@googoo.com\">support@googoo.com</a></address>\n";
print "</HTML>\n";
exit;
}
Another method of tracking is to read information from a log file,
and to create your tracking data from this data.
The file that contains the important information about the Goo
Goo Records site is known as the log file. Since they are using
the EMWAC HTTP service with their Web site, a log file is created
each day and kept in the log file directory. The directory path
for the log file directory on the Goo Goo Records server is C:\WINNT35\system32\LogFiles.
Each log file is given a file name relating to the date it was
created, following the general format of HSyymmdd.LOG. For example,
a log file created for July 6, 1996 would have the log filename
HS960706.LOG. An example of a log file's contents would resemble
this excerpt of a listing from the log file HS960509, from a server
in Finland:
Thu May 09 20:09:17 1996 wait.pspt.fi 194.100.26.175 GET /ACEINDEX.HTM HTTP/1.0
Thu May 09 20:09:18 1996 wait.pspt.fi 194.100.26.175 GET /gif/AMKVLOGO.GIF HTTP/1.0
Thu May 09 20:09:19 1996 wait.pspt.fi 194.100.26.175 GET /gif/RNBW.GIF HTTP/1.0
Thu May 09 20:09:19 1996 wait.pspt.fi 194.100.26.175 GET /gif/RNBWBAR.GIF HTTP/1.0
Thu May 09 22:35:09 1996 wait.pspt.fi 194.215.82.227 GET /gif/WLOGO.GIF HTTP/1.0
Thu May 09 22:35:11 1996 wait.pspt.fi 194.215.82.227 GET /gif/BLUEBUL.GIF HTTP/1.0
Thu May 09 22:35:11 1996 wait.pspt.fi 194.215.82.227 GET /cgi-bin/counter.exe?-smittari+-w5+./DEFAULT.HTM
HTTP/1.0
Thu May 09 22:35:13 1996 wait.pspt.fi 194.215.82.227 GET /gif/EHI.JPG HTTP/1.0
Thu May 09 22:35:17 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI1.gif HTTP/1.0
Thu May 09 22:35:17 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI2.gif HTTP/1.0
Thu May 09 22:35:19 1996 wait.pspt.fi 194.215.82.227 GET /AVIVF.HTM HTTP/1.0
Thu May 09 22:35:23 1996 wait.pspt.fi 194.215.82.227 GET /gif/virtlogo.gif HTTP/1.0
Thu May 09 22:35:23 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI1.gif HTTP/1.0
Thu May 09 22:35:29 1996 wait.pspt.fi 194.215.82.227 GET /gif/KOULU.GIF HTTP/1.0
Thu May 09 22:35:32 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI2.gif HTTP/1.0
Thu May 09 22:35:45 1996 wait.pspt.fi 194.215.82.227 GET /gif/VF21.GIF HTTP/1.0
Thu May 09 22:36:02 1996 wait.pspt.fi 194.215.82.227 GET /gif/NAPPI3.gif HTTP/1.0
Thu May 09 22:36:14 1996 wait.pspt.fi 194.215.82.227 GET /gif/LETTER.GIF HTTP/1.0
Thu May 09 22:37:46 1996 wait.pspt.fi 194.215.82.227 GET /AVIONGEL.HTM HTTP/1.0
Thu May 09 22:37:52 1996 wait.pspt.fi 194.215.82.227 GET /gif/PIRUNLG.GIF HTTP/1.0
Thu May 09 22:44:43 1996 wait.pspt.fi 194.215.82.227 GET /AVIPELI1.HTM HTTP/1.0
Thu May 09 22:44:45 1996 wait.pspt.fi 194.215.82.227 GET /gif/STRESSLG.GIF HTTP/1.0
Fri May 10 04:29:29 1996 wait.pspt.fi 192.83.26.48 GET /gif/NAPPI3.gif HTTP/1.0
Fri May 10 04:29:30 1996 wait.pspt.fi 192.83.26.48 GET /gif/LETTER.GIF HTTP/1.0
Fri May 10 04:29:31 1996 wait.pspt.fi 192.83.26.48 GET /gif/engflag.jpg HTTP/1.0
Fri May 10 04:30:21 1996 wait.pspt.fi 192.83.26.48 GET /AVIVF.HTM HTTP/1.0
Fri May 10 04:30:26 1996 wait.pspt.fi 192.83.26.48 GET /gif/virtlogo.gif HTTP/1.0
Fri May 10 04:30:27 1996 wait.pspt.fi 192.83.26.48 GET /gif/VF21.GIF HTTP/1.0
Fri May 10 04:30:30 1996 wait.pspt.fi 192.83.26.48 GET /gif/KOULU.GIF HTTP/1.0
Fri May 10 04:31:11 1996 wait.pspt.fi 192.83.26.48 GET /AVIPELI2.HTM HTTP/1.0
Fri May 10 04:31:13 1996 wait.pspt.fi 192.83.26.48 GET /gif/LAITE.GIF HTTP/1.0
Fri May 10 04:31:14 1996 wait.pspt.fi 192.83.26.48 GET /gif/KOKOONP.JPG HTTP/1.0
Fri May 10 04:31:32 1996 wait.pspt.fi 192.83.26.48 GET /AVIPELI3.HTM HTTP/1.0
Fri May 10 04:31:33 1996 wait.pspt.fi 192.83.26.48 GET /gif/TIKI1.GIF HTTP/1.0
Fri May 10 04:31:33 1996 wait.pspt.fi 192.83.26.48 GET /gif/TPIRU1.GIF HTTP/1.0
Fri May 10 04:31:33 1996 wait.pspt.fi 192.83.26.48 GET /gif/TSTRE1.GIF HTTP/1.0
Fri May 10 04:31:46 1996 wait.pspt.fi 192.83.26.48 GET /AVIPELI4.HTM HTTP/1.0
Fri May 10 04:32:03 1996 wait.pspt.fi 192.83.26.48 GET /ACEINDEX.HTM HTTP/1.0
Fri May 10 04:32:19 1996 wait.pspt.fi 192.83.26.48 GET /ACEVF.HTM HTTP/1.0
Fri May 10 04:32:21 1996 wait.pspt.fi 192.83.26.48 GET /gif/ROBOCOP1.GIF HTTP/1.0
Fri May 10 04:33:01 1996 wait.pspt.fi 192.83.26.48 GET /ACEINDEX.HTM HTTP/1.0
Fri May 10 07:54:44 1996 wait.pspt.fi 193.166.48.136 GET /gif/NAPPI1.gif HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /gif/NAPPI2.gif HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /gif/NAPPI3.gif HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /cgi-bin/counter.exe?-smittari+-w5+./DEFAULT.HTM
HTTP/1.0
Fri May 10 07:54:45 1996 wait.pspt.fi 193.166.48.136 GET /gif/LETTER.GIF HTTP/1.0
Fri May 10 10:08:25 1996 wait.pspt.fi 192.89.123.26 GET /gif/VFLOGO.GIF HTTP/1.0
Fri May 10 10:08:25 1996 wait.pspt.fi 192.89.123.26 GET /gif/AMKVLOGO.GIF HTTP/1.0
Fri May 10 10:08:37 1996 wait.pspt.fi 192.89.123.26 GET /AVIVF.HTM HTTP/1.0
Fri May 10 10:08:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/VF21.GIF HTTP/1.0
Fri May 10 10:08:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/KOULU.GIF HTTP/1.0
Fri May 10 10:11:59 1996 wait.pspt.fi 192.89.123.26 GET /AVITULOS.HTM HTTP/1.0
Fri May 10 10:12:05 1996 wait.pspt.fi 192.89.123.26 GET /gif/VIFA5PAP.GIF HTTP/1.0
Fri May 10 10:12:44 1996 wait.pspt.fi 192.89.123.26 GET /gif/NAPPI2.gif HTTP/1.0
Fri May 10 10:12:47 1996 wait.pspt.fi 192.89.123.26 GET /gif/NAPPI3.gif HTTP/1.0
Fri May 10 10:13:49 1996 wait.pspt.fi 192.89.123.26 GET /AVIONGEL.HTM HTTP/1.0
Fri May 10 10:13:59 1996 wait.pspt.fi 192.89.123.26 GET /gif/PIRUNLG.GIF HTTP/1.0
In this log file you can see the different calls to the different
Perl scripts, and the method by which the request is made, either
Get or Post. The log file begins with the first request made that
day, and finishes with the last. This example is a very short
one, edited from its original for this example, so you can imagine
that log files on very active Web servers can easily become triple
this length. Purging log files is a very important practice to
integrate into your Web maintainence routine.
When you go to purge your log files, remember that you are going
to erase information you may need in the future. If you are generating
reports from these logs, make sure you only delete logs for which
reports have already been made. It is very common that these reports
run on a one- or two-week lag time behind the current date, so
the last one or two weeks' log files must be kept to successfully
generate these reports.
The creation of a new log file for each day makes this process
of purging much easier than some HTTP servers which place all
log entries into one file, like the "access_log" file
used with the NCSA HTTP server. Instead of having to go into the
file and delete specific entries, creating an editing hassle,
the EMWAC server gives you the advantage of deleting the entire
log file for the days no longer necessary for generating reports.
Each log file is kept open until the next day's log records its
first action, or transaction. Once this transaction occurs, the
previous day's log file is closed. The data transactions recorded
in the EMWAC log files are as follows:
- The time and date of the request
- The IP address or domain name of the server
- The IP address or domain name of the client
- The HTTP command
- The URL requested
- The version of the HTTP protocol used (when no version shows
up in the log file, this means the default version of 0.9
HTTP was used)
All of this information can be used to provide detailed reports
on Web site traffic.
One way to find out the accurate number of hits a site is receiving
is to use the daily log file. By understanding the format of the
HTTP header that makes the request of the site's home page, we
can use a simple script to count actual hits.
Using the grep command in Perl, the Goo Goo Records' Web site
first used this script to figure out how many users accessed their
site. You might recall that the grep command uses the concept
of regular expressions to look for a match and then compiles a
list of all matches to the designated character string, or regular
expression.
#! usr\bin\perl
print "content-type: text/html\n\n";
$num = grep -c 'GET / HTTP' /googoo.com/ WINNT35\system32\LogFiles' ;
$num += 'grep -c 'GET index.sht /googoo.com/ WINNT35\system32\LogFiles' ;
$num += 'grep -c 'GET index.htm /googoo.com/ WINNT35\system32\LogFiles' ;
print "$num\n";
The Web Master abandoned this method of user hit tabulation early
on for several reasons. The first reason was that this method
may be more accurate, but it is very time consuming, because it
has to read through and count every match that occurs in the long
daily log files. The second reason is that each page that was
to be monitored had to have its own modified version of this script,
because the script makes a specific call to the page named in
the script. Another bad side effect of this script is that it
forces you to make your index Web page, index.htm, and a Server
Side Includes page for the whole thing to work. This will greatly
reduce the speed at which your home page works. The final reason
is that the site started using the EMWAC HTTP service, which doesn't
support Server Side Includes (notice the ".sht" file
extension used in the script which is the shortened, and approved,
NT version of ".shtml"), making the scripts useless.
Good thing for the Web Master there are several other ways to
count hits on a Web page.
There are very few people left who use the Web and have not encountered
HTTP server codes yet. There may be nothing quite as frustrating
as not receiving the HTML document you requested, but instead
the message, "Forbidden, access not granted" or a similar,
one-line response. These responses are some of the many HTTP status
codes which are issued with each request made of a Web server.
Table 14.1 outlines the different types of HTTP status codes,
and what they mean.
Table 14.1 HTTP status codes
HTTP Status Code |
Code Type | Meaning
|
200 | Successful request
| OK-The request was satisfied. |
201 | Successful request
| OK-following a POST command. |
202 | Successful request
| OK-request accepted for processing, but processing is not complete.
|
203 | Successful request
| OK-Partial information-the returned information is only partial.
|
204 | Successful request
| OK-No Response-request received but no information exists to send back.
|
300 | Redirection
| Moved-The information requested is in a new location and the change is permanent.
|
301 | Redirection
| Found-The information requested temporarily has a different URL.
|
302 | Redirection
| Method-Information under going change, a suggestion for the client to try another location.
|
303 | Redirection
| Not Modified-The document has not been modified as expected in the Get request.
|
304 | Redirection
| Not delivered from cache-The client already has all the information, the browser just needs to display it again.
|
400 | Error with client
| Bad Request-A syntax problem with the client's request, or the request could not be satisfied.
|
401 | Error with client
| Unauthorized-Client does not have authorization to access information requested.
|
402 | Error with client
| Payment Granted-Used when payment methods are employed by the server and have been satified/ accepted.
|
403 | Error with client
| Forbidden-No access for client to information, even with proper authorization.
|
404 | Error with client
| Not Found-Server could not find file to satisfy client request.
|
405 | Error with client
| Method Not Allowed-The method used in the request line is not allowed for access to the information in the request URL.
|
406 | Error with client
| None Acceptable-The information requested has been found, but not within the conditions stated in the request, monitored by the Accept and Accept-Encoding request headers.
|
407 | Error with client
| Proxy Authentication Required-This code is not in service yet because HTTP 1.0 does not have proxy capacity yet. When it does, this will indicate proper client authentication necessary to continue.
|
409 | Error with client
| Conflict-There is a conflict with the information requested in its current state, preventing access.
|
410 | Error with client
| Gone-Information requested by client is no longer available, with no forwarding URL.
|
411 | Error with client
| Authorization Refused-The credentials in the client request are not satisfactory to allow access to requested information.
|
500 | Error with server
| Internal Server Error-An unexpected condition has caused the server to be unable to satisfy the client's request.
|
501 | Error with server
| Not Implemented-The client's request includes facilities not currently supported by the server.
|
502 | Error with server
| Bad Gateway-Upstream gateway access necessary for completing request denied or failed.
|
Understanding these status codes is critical if you want to keep
track of what is happening on your server with your Web sites.
While these status codes are not recorded by the EMWAC HTTP service
log files that the Goo Goo Records site uses, they are in the
log files of other servers.
In each of the Perl scripts that we have used so far, a standard
bit of code is used to parse off the form data. We then used that
parsed data to make decisions on which the Perl script is to act.
Form data, however, is not the only data we can glean from a user
through the server. In any Perl script called by an HTML document,
we can use the special environment variables to make decisions.
Environment variables are accessed through the %ENV variable,
and can be readily used. For example, if you wanted to track how
many users from the googoo.com domain have used your Perl scripts,
you could add the following snippet of code to each Perl script:
if ($ENV{'REMOTE_HOST'}=~/googoo\.com/i) {
open(TRACK,"c:\logs\scripts.trk");
$line=<TRACK>;
close(TRACK);
$line++;
open(TRACK,">c:\logs\scripts.trk");
print TRACK $line;
close(TRACK);
}
This script will increment the number contained in scripts.trk
every time the script is accessed only if the client is accessing
from within googoo.com. This could be useful for Web sites that
will only deliver certain pages to internal users, or to track
which users are inside, and which are outside your company.
In addition to the environmental variables already present on
the NT Server, and those which may have been added, the EMWAC
HTTP service uses the environmental variables listed in Table
14.2.
Table 14.2 Environmental variables
Environmental Variable
| Description |
CONTENT_LENGTH | The length of the content as received from the client.
|
CONTENT_TYPE | The content type of the information received that has attached data, as with "POST" requests.
|
GATEWAY_INTERFACE | The CGI specification revision for the server, in the format of CGI/revsion.
|
HTTP_ACCEPT | This is the list of MIME types that the HTTP server will recognize, or accept, for use.
|
PATH_INFO | The path data based on the client's request.
|
QUERY_STRING | All the information that follows the "?" in the URL when the script specified was accessed using "GET."
|
REMOTE_ADDR | The client's IP address.
|
REQUEST_METHOD | The method of request made by the client, as"GET," "POST," and so forth.
|
SCRIPT_NAME | Path name of the script requested to execute.
|
SERVER_NAME | The server's host name, DNS alias or IP address, in the form it would appear in a self-referencing URL.
|
SERVER_PORT | The port number to where the client's request was sent.
|
SERVER_PROTOCOL | The name/version of the server's information protocol.
|
SERVER_SOFTWARE | The name/version of the information server software that answered the client's request.
|
These environmental variables are a subset of the standard CGI
designated environmental variables for HTTP service.
It is an incorrect assumption that all the people using the Web
do so through Netscape Navigator. While this is by far and away
the most widely used Web client, or browser, it is not the only
one. Microsoft's Internet Explorer is one of several other browsers
growing in use. Both Navigator and Explorer use HTML tags not
supported by the protocol's standards, and that can't be utilized
by the other's software. This means that your site may look different
to different users and their different browsers. To avoid the
problem of having users find your Web site out of sync with their
browser, keeping tabs on which browsers are accessing your Web
site is invaluable.
The Goo Goo Records site has added an element so they can determine
which browsers are accessing their site, and at what percentage.
Eventually they plan to have special pages for each different
browser, making use of each browser's strengths.
The following script snippet records which browsers are used to
access the Web site:
open(TRACK2,">>c:\logs\browsers.trk");
print TRACK2 "$ENV{'HTTP_CLIENT'}\n";
close(TRACK2);
This Perl snippet prints the browser type to the file browsers.trk.
The IP address, and the related InterNIC domain name, is the way
computers find each other on the Internet. This series of four
numbers separated by three periods gives each computer on the
Internet, which includes the Web, its own identity. Domain names
are character equivalents that are assigned to these numbers.
For more details concerning IP addresses and domain names, check
out the InterNIC site at
http://www.internic.net/
When a computer contacts your server, it leaves its IP address
as a calling card, which is recorded in the log file. The environmental
variable REMOTE _HOST also stores this address, or sometimes domain
name, as its value.
Having a record of your users' IP addresses can be used to determine
where your users are from, and to understand which servers you
are more popular on. This information can also be used to find
out the address and identy of any problem users. To find out the
information which comes with an IP address, consult the InterNIC
directory, whose URL was given earlier.
There is an environmental variable which records the URL of where
the user has come from as its value. The name of this environmental
variable is PREVIOUS_URL. This variable can be used to track a
user through a site, or to find out where your site connects to
the outside resources on the Web.
The term hits is used very loosely on the Web, and means
everything from the actual action of picking any HTML link, to
the base unit used to measure Web site traffic. For the purposes
of this book a hit is simply any time a user calls up a resource
on the Web, whether it be an HTML document, image, or downloadable
program. When that resource is accessed successfully, a hit can
be considered to be counted against it, or on it. Moving about
within one HTML document would not be a hit, but moving from one
HTML document to another within the same site would count as one
hit.
To record a hit, one of the methods discussed in this chapter
can be used. The hit may be registered in different ways using
a short Perl snippet on the HTML document, or the hit information
can be read by a Perl script from one of the HTTP service's log
file.
Another way to keep track of Web site traffic is by creating your
own page counting scripts which do not rely on logs for statistical
information. The way in which this can be done is to use Perl
to create either a plain text file, or the more useful database
management file, or DBM file.
DBM files are used on the Internet so that different platforms
and different operating systems can access the same information.
With Windows NT DBM, files are accessed through an application
programming interface, or API. It is through the API that the
client communicates with the database. Microsoft's SQL server
may be used as an API. Manipulating DBM files with Perl is a straightforward
affair, as this next section demonstrates.
The main functions to use to manipulate DBM files in Perl are
"dbopen()," "dbclose()," "reset(),"
each()," "values()," "and keys()." Some
of these functions were dealt with earlier in the book, and the
others will be explained here.
To begin with, the dbopen command is used to create a link between
a DBM file and an associative array. The format for this would
be something like:
dbopen(%new_array,DB_new_file, read-write-mode);
where Perl will create two new files if the file name specified
in the statement does not exist. The new files would have the
names "db_newfile.dir" and "db_newfile.pag."
To prevent these files being created set the read/write mode to
"undef."
The different parameters specified in the above statement operate
like this: the %new_array is an associative array and behaves
like one; DB_new_file is the DBM file being opened, without specifying
either the ".dir" or ".pag" extentions(a full
path and file name for the DBM file should be used here); the
Read_write_mode which sets the file permissions for the DBM file.
To sever the connection between the DMB and the associative array
use the dbclose command in this format:
dbclose(%new_array);
There is just one small problem with this method of tracking a
Web site on an NT server. Currently, in Windows NT the DBM funtions
in Perl are unsupported. This method is included in this book
now for two important reasons: the first is as an example of how
tracking can be done outside of using logs, the second is that
the DBM function in Perl NT may be supported soon, so you'll be
ready for it.
To help deal with Perl scripts that use unsupported routines and
functions for Windows NT, a NT Perl checklist is necessary.
This section is a little out of place in this chapter, but it
seemed like a good idea to add it here. One big headache that
confronts Perl programmers, especially those working in non-UNIX
environments, is finding a Perl script that satisfies your needs,
but then when you try and run it it fails. After numerous futile
attempts at execution you discover that the script uses Perl functions
not supported in your version, or porting, of Perl.
In Windows NT, the list of unsupported functions is long enough
and extensive enough to cause problems with the inability to use
DBMs explained in this chapter. The following script is constructed
to search any Perl script for any currently unsupported NT functions.
Think of it as an acid test for new scripts you want to add to
your Perl library.
#!/usr/bin/perl
# nttest.pl
# This is where the list of unsupported functions goes... @functions=("getnetbyname",
"getnetbyaddr","getnetent","getprotoent",
"getservent","sethostent","setnetent","setprotoent","setservent","endhostent",
"endnetent","endprotoent","endservent","socketpair,"msgctl","msgget","msgrcv",
"msgsnd","semctl","semget","semop","shmctl","shmget","shmread","shmwrite",
"ioctl","select($w, $x, $y, $z)","chmod","chroot","fcntl","flock","link",
"lstat","readlink","symlink","sysread","syswrite","umask","utime","crypt",
"getlogin","getpgrp","getppid","getpriority","getpwnam","getgrnam","getpwuid",
"getgrgid","getpwent","getgrent","setpwent","setgrent","endpwent","endgrent",
"setpgrp","fork","kill","pipe","setpriority","times","wait","waitpid","alarm",
"dbmclose","dbmopen","dump","syscall");
$filename=$ARGV[0];
if(!$filename) {
print "\nUsage: nttest.pl <scriptname>\n\n";
exit;
}
$linecount=0;
$errors=0;
open(SCRIPT, $filename);
while ($line=<SCRIPT>) {
$linecount++;
foreach $func (@functions) {
if ($line=~/$func[\s|(]/i) {
print "Line $linecount: Function $func() is unsupported
by Perl for Windows.\n";
$errors++;
}
}
}
close(TRACK);
if (!$errors) {
print "This script contains no unsupported functions, and should
work with Perl for Windows.\n\n";
}
else {
print "This script contains unsupported functions, and will not
work under Perl for Windows.\n\n";
}
With this script you should save hours of time debugging a Perl
script that will never run on Windows NT. Now if someone could
write a Perl script that then fixed these unsupported features
so the script did work in NT, that would be really something.
Please remember that all is not lost. Most of these unsupported
functions are not useful in the scripts, and Perl is amazing at
doing the same task in different ways. With a little ingenuity
and reworking, these scripts may function fine in Windows NT.
In this chapter we covered the ability to keep track of a Web
site's traffic, using the example of the Goo Goo Records site.
Their site uses logs to generate reports to keep track of who
is using the site, which browser they are using, and where they
go in the site.