Click Here!
home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Perl CGl Programming: No experience required.
(Publisher: Sybex, Inc.)
Author(s): Erik Strom
ISBN: 0782121578
Publication Date: 11/01/97

Bookmark It

Search this book:
 
Previous Table of Contents Next


Breaking Down the New Code Sample

The new Perl script, logs1.pl, mainly deals with a new format in log files. However, there are a couple of new Perl goodies that should be explained.

Notice first of all the code at the beginning of the program:

   # Put the log file name into a local variable — full path.

       $LogFile = "c:/sambar/logs/access.log";

   # Open the file; die if that’s not possible.

       open (LOG, $LogFile) || die "Can’t open $LogFile: $!\n";

We store the full path to the Sambar log file in a variable; this makes it easier to deal with and change, you’ll remember from previous skills. The file is opened for reading (or the script dies trying) with the file handle LOG.

Next, we set up a loop that reads the log file line by line:

   # Read, extract and print each line from the log file.

       while (<LOG>)
           {
           $LogLine = $_;           # Store the line locally.

The while statement refers to the LOG file handle enclosed in less-than and greater-than signs, which will read a line at a time into the Perl variable $_ until nothing is left in the file. The first line in the loop stores the line in the local variable $LogLine.


TIP:  The incoming log line could be left in $_ without putting it in a local variable. It seems a little clearer, especially in subsequent references in the program, to put the value in a descriptive variable. However, as usual in Perl, it’s a matter of taste. The implementation is ultimately up to you.

In the next couple of code lines, we do some simple formatting to $LogLine.

   # Strip out the characters we don’t need.

           $LogLine =~ s/\[|\]|\"//g;
           chop ($LogLine);

The string contains three characters—[ , ], and “—that we don’t need. The substitution in the first line strips them out. If you look closely at the regular expression in the substitution, it begins to make sense: You can pick out the three characters to be stripped. All of them have special meaning in a regular expression, so they’re escaped with the backslash (\); and finally, each substitution is separated from the others with the Perl OR operator (|).

The next line calls a function that you haven’t seen before: chop. It’s one of those handy little utilities that doesn’t do much, but you find yourself using it a lot. The function does nothing but chop the last character off of a string. It’s usually used to get rid of the newline (\n) character at the end of a line, and that’s how we’ve used chop here. The last line of interest in logs1.pl is the one that calls split.

   # Extract the components using split()       ($ClientIP, $Dummy,
   ⇒ $UserName, $DateTime, $TimeZone, $Operation,
              $Target, $HTTPVers, $SrvrStatus, $NTStatus,
              $BytesXfer) = split (/[ ]+/, $LogLine);

Notice that some new variables had to be declared to match the format of a Sambar log file entry. The pattern specified for split has changed, too. It looks a little strange, but like the substitution done on $LogLine, it begins to make sense if you examine it closely. We want to break out the fields in the log entry separated by spaces. However, there are two spaces between the operation target and the HTTP version, as in the example that was used to start this section:

   "GET session\adminlogin  HTTP/1.0"

The regular expression [ ] will match on a space, but two spaces will constitute two matches and will thereby throw off the count of variables. Remember that split will go through the list of variables on the left side of the equation sequentially, throwing values into variables as it encounters matches for its pattern. Putting a plus sign after the brackets ([ ]+) will match on one or more spaces, thus ensuring that all white space is ignored and the proper values go to the proper variables.

Monitoring Activity from a Web Page

Your experience with Web server log files now includes negotiating the formats for Microsoft’s Internet Information Server and the Sambar server, which produces logs in a format very similar to the UNIX Common Log Format. You may be wondering: What can you do with all of that information?

Well, this is another of those situations in which you are limited only by your imagination. You have the information; you have the tools to manipulate the information; and you have the tools to display the information in really any way you like.

A good place to start is a Web page. You already know quite a bit about using Perl scripts to create HTML documents that can be displayed on your Web site through CGI. Why not use some of that knowledge to create a statistical Web page that uses the information in your server log files?

Who Gets In and How Often?

With a little manipulation, you can determine the sources of “hits” on your Web site and how often they connect.

You’ll use the IIS log files for this example. Recall from the previous section that the default IIS log file scheme is to create a new file for each day. This scheme results in a directory full of log files, each of which you will have to step through to get the information for a particular day.

Here’s what you’ll do with the log information for each day:

  Identify the various IP addresses from which your Web site has been contacted.
  Count the number of IPs.
  Count the number of hits from each IP.
  Format the resulting data in an HTML document and display it in a browser.

Let’s start with one file, then build up to the entire directory. The output will go to the screen for now.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited.