Chapter 3
Programming with Perl
CONTENTS
To program with Perl, we can take variables, assign them values
from within the program or user input, and then manipulate them.
Before we can get right into Perl programming, we have to get
another central concept under our belt: regular expressions, which
are dealt with in detail in this chapter.
Regular expressions are a concept from the UNIX world. They are
a like a pattern, or template, which is matched against a string.
Strings are sequences of characters, like "hello" or
"please pass the butter." When a regular expression
tries to make a match, it either succeeds or fails. The regular
expression is not a literal translation of the string, but a representation
of it.
Think of a regular expression as being like a verbal expression
in slang. When the guys are hanging out and one of them calls
to another, "Yo, Homie! You look fat today!" he is not
referring to a weight problem his friend may be having. "Fat"
is a slang term, and means "looking good" or "your
appearance is exceptional."
The key to using regular expressions is two-fold. The first thing
you must understand is the pattern you are trying to match. The
second is to understand the different patterns available to you
to make a pattern match.
Regular expressions are used in many different operating systems,
and by many different programs and processes. If you are already
familiar with regular expressions in another context, then you're
in luck. While the syntax of regular expressions vary between
operating systems, the concepts remain the same.
Before getting into the nitty-gritty of regular expressions, it
might help if we looked into some related programming issues in
Perl. The issues include control structures, associative arrays,
and data I/O using <STDIN>.
It is important to be able to tell the Perl interpreter when you
want things done in your script. To do this you use control structures,
like a statement block, or different kinds of loops.
The Statement Block
The simplest control structure in Perl is the statement block,
which is made up of a series of statements that are enclosed in
curly braces, and might look something like this:
{
$one = "1";
@two = (1,2,3);
%three = $two[Ø};
}
where the statements inside the statement block are indented one
tab past the curly braces.
When Perl encounters a statement block it executes each statement
consecutively, starting with the first, and working its way to
the last. Perl will treat the entire block as a single statement
in the script as a whole.
Statement blocks are often used as part of the syntax of statement
loops.
The If/Unless Statement Loop
In an if/unless loop a designated expression is examined for truth,
and if it is true, then one series of events is started. If it
is false, then another path is taken in the script. A simple format
for an if/unless statement loop is
if (the_expression) {
statement_if_expression_true;
} else {
statement_if_expression_false;
}
where Perl will evaluate the _expression, called the control expression,
to see if it is true or false. If it is true then it goes to the
statement in the first block following the if command. If the
control expression is false, Perl goes to the statement following
the else command.
Applying this we can get a script that asks for some user input,
then evaluates it with different possible outcomes depending on
the input received, like this:
print "What is the temperature?";
$temp = <STDIN>;
chop ($temp);
if ($temp < 7Ø) {
print "Brrr, you better get a sweater!\n";
} else {
print "Is it hot enough for ya?\n";
}
If you want to return a statement only if the result of the expression
test is false, you can use the unless command:
print "What is the temperature?";$temp = <STDIN>;
chop ($temp);
unless ($temp < 7Ø) {
print "Is it hot enough for ya?\n";
}
so that the user will receive the print statement unless their
input is determined to be false.
Another option in an if/else loop is the elsif command. You may
want to have several choices for the script's execution, so you
include the elsif command to handle the other options, like this:
print "What is the temperature?";$temp = <STDIN>;
chop ($temp);
if ($temp < 7Ø) {
print "Brrr, you better get a sweater!\n";
} elsif (7Ø < $temp < 8Ø) {
print "A little cool, but comfortable.\n";
} elsif (8Ø < $temp < 9Ø) {
print "Nice and cozy./n";
} else {
print "Is it hot enough for ya?\n";
}
where you are not limited in your options with elsif, and can
have as many of these branch control structures as you need.
The While/Until Loop Statement
There may come a time (probably sooner rather than later) where
you'll need to have a block of statements repeatedly read until
a certain condition is met. This is done with the while/until
loop statement.
A typical use for this loop is to count something down, like:
print "How high for your countdown?"; # where
# the user sets the upper limit
# of the countdown
$count = <STDIN>;
chop ($count);
while ($count > Ø) {
print "T minus $count, and counting...\n";
$count--;
}
where the while loop is executed until the value of $count is
equal to 0. You may have noticed the use of the autodecrement
operator on $temp to lower the value each cycle.
The while loop also has an option to return a statement if the
condition of the input is false, called the until command. Used
in the same way, until looks like this:
print "How long for your countdown?";
$count = <STDIN>;
chop ($count);
until ($count > Ø) {
print "Lift off!\n";
$count--;
}
The print statement is not executed until the condition of $count
is satisfied.
Another way to repeat, or iterate, a statement block is with the
for and foreach commands.
The For/Foreach Loop Statements
When you need a script to evaluate an expression and then re-evaluate
it in a countdown fashion, you can use the for command like this:
for ($count = 15; $count >= 1; $count--) {
print "$count \n";
}
The countdown is printed from 15 to 1, with each number appearing
on a new line. $count is given the value 15, tested against the
condition of being >=1, printed, then autodecremented and looped.
When $count = 1, then loop dies after printing the final value
for $count.
There may be an instance where you need to create a loop with
a variable in which its value will change in the loop, but you
need it restored after the loop dies. You need the variable to
be local to the loop. You can do this with the foreach command.
With foreach, a list of values is created and then it places them
into a scalar variable one at a time, and then computes that statement
block designated by the foreach command. An example of this might
be:
@letters = ("A","B","C","D");
foreach $new (reverse @letters){
print $new;
}
where the output will be D C B A. A special Perl variable can
be used here to simplify the code. The $_ scalar variable is the
default variable with many commands, like foreach, into which
values can be placed. The same script above would look like this:
@letters = ("A","B","C","D");
foreach $_ (reverse @letters) {
print $_;
}
and this can be shortened even more, because Perl will see the
$_ variable, even if it is left out, like this:
@letters = ("A","B","C","D");
foreach (reverse @letters) {
print ;
}
The foreach statement is also handy if you want to change the
values of an entire array. It works like this:
@numbers = (2,4,6,8);
foreach $two (@numbers) {
$two *= 2;
}
which give @numbers the new element values of (4,8,12,16). Just
by changing the scalar $two you can change the entire array @numbers.
In Perl, associative arrays take the place of other recursive
data types, like trees, that are used in other computer languages.
These kinds of arrays are very similar to the list kind of arrays
discussed in Chapter 2 The main difference between them is that
a list array has index values for its elements, which start at
0 and increment by whole numbers to the end of the array, whereas
the associative array uses arbitrary scalars, also called keys.
What this means is that you are not limited to referring to an
element in an array by its integer-based index value, but you
can use whatever value you choose to associate with the array
element. It is quite normal, and quite desirable, to associate
strings with particular array elements in this way.
A good model for getting a better understanding of how associative
arrays work might be a Rolodex. New names and phone numbers are
written on separate cards and filed in the Rolodex's alphebetical
sections. When you want to find out the number of a new friend,
you go to the letter your friend's name is filed under to find
the number.
With associative arrays, the new names and phone numbers are scalar
values, while the letters on the Rolodex cards themselves are
the keys. To find the values, you look for the key to that value
in the associative array, like looking for the phone number using
the letters.
Associative arrays use a variable in this format:
%variable_name
Unlike array list variables, associative arrays are usually referred
to with their keys. Let's look at one:
%month = (
"'Jan', 'January',"
"'Feb', 'February',"
"'Mar', 'March',"
"'Apr', 'April',"
"'May', 'May',"
"'Jun', 'June',"
"'Jul', 'Jul',"
"'Aug', 'August',"
"'Sept', 'September',"
"'Oct', 'October',"
"'Nov', 'November',"
"'Dec', 'December',"
);
where single and double-quotes have the same powers they did in
array lists.
If you want to subscript an associative array, the process is
similar to subscripting an array list, but curly braces replace
the square brackets, so:
# a simpe array
@wolf = (4,5,6);
$wolf[3] = "moon"; # making @wolf
# now (4,5,6,"moon")
$wolf[5] = "howl"; # producing
# (4,5,6,"moon",undef,"howl")
becomes this as an adapted associative array:
%wolf = (4,"four",5,"five",6,"six");
$wolf{4} = "moon"; # making %wolf
# now (4,"moon",5,"five",6,"six")
$wolf{5} = "howl"; # producing %wolf values
# (4,"moon",5,"howl",6,"six")
Note that there are always an equal number of elements in an associative
array. There has to be a value and its key, otherwise the missing
value or key is given the undef value.
When you call up an associative array's values there is no literal
equivalent as there is with array lists. Instead, Perl creates
a string of key/value pairs in whatever order is easiest at the
time, depending on the phases of the moon. (Just kidding! Perl
creates the literal list of pair values based on what is fastest
at that moment based on where you are in the Perl script, so each
time you determine a literal list, the order will be slightly
different.)
Associative arrays also have their own operators.
The Keys Operator
To get a list of all the current keys of an associative array,
put the array name in parentheses behind the operator keys, like
this:
keys(%wolf); #which would create
# the key/value list (4,5,6,), or
# something similar since the order
# does not have to remain the same
where the use of parentheses is an option, and can be used as
you see fit.
The Values Operator
As you might imagine, the values operator works like the keys
operator, except that it returns a list of values of an associative
array
values(%wolf); # creates
# ("moon","howl","six")
in no particular order.
The Each Operator
To inspect all the elements of an associative array, you can use
the each operator like the other
each(%wolf);
which would return the first key/value pair in the array:
% 4,moon,5,howl,6,six
The Delete Operator
This is the operator which allows you to remove key-value pairs
by designating the key of the pair you want removed:
%wolf = (4,"four",5,"five",6,"six");
delete $wolf{5};
# returns an element list for
# %wolf of (4,"four",6,"six")
We already know that you can use <STDIN> to take a line
of user input and store it as a scalar variable value. We also
know that you can put a collection of user input from <STDIN>
into an array, where each line entered is kept as a separate element
in the array key/value pairs. But we haven't yet really focused
on how to manipulate these values.
Perhaps you want to go through each line of text and change some
of them. You might do this by creating a simple loop, like this
one:
while ($_ = <STDIN>) {
}
where the command while creates a repetition loop. Perl will keep
putting the lines in <STDIN> into the scalar variable $_
until the file runs out of lines and the loop dies.
When using Perl output, both the familiar print and related printf
operators can be used to write to <STDOUT>. The print operator
can do more than just produce text to display; this operator acts
like a list operator and can move strings into <STNOUT>
without adding any characters.
The printf operator gives you more command over your strings than
print does. With printf you can designate a format control string
with the first argument, which will determine how the rest of
the arguments will be printed. An example might be:
printf "%1Ø %2Ø", $wolf, $moon;
where $wolf will be printed to a 10-character field and $moon
will be printed to a 20-character field.
There are other modifiers that can be used with the printf operator
that can designate the string to be printed as a decimal integer,
floating-point value, or with spaces or tabs.
To demonstrate a simple use of a regular expression, I want to
introduce you to a simple command called grep, short for global
regular expression print. Grep is a very powerful command that
has come from the UNIX world. With grep you can take a regular
expression and search a file line by line, trying to match the
string indicated in the regular expression. Using grep at the
command line might look like this:
grep crypt bonus.pl
% bonus_change.pl
where grep will examine every line of code in bonus.pl to see
if it contains the string "crypt," and then output those
lines into the file bonus_change.pl via <STDOUT>.
To denote the string "crypt" as a regular expression
in Perl, it is enclosed between slashes like
/crypt/
and would look like this in a script:
if (/crypt/) {
print "$_";
}
where the regular expression crypt is tested against the special
variable $_, the default scalar variable where everything has
been stored at one time or another.
There are a number of special variables in Perl, like $_, that
have their own designated features designed to make Perl easier
to use, and they are touched on throughout the book where most
appropriate.
Let's start with a simple example of a guestbook using WinPerl.
First you'll want to create a Perl file called guest.pl.
NOTE |
You can write Perl scripts in any text processing application, like Notepad, Write, or Microsoft Word, if you so desire. Whatever you use to create your scripts, remember to save your script as text only, and with the .pl file extension.
|
In keeping with good programming practice, the first line of the
script will be a comment line stating the name of the file. This
is very handy when you, or someone else, may have to work with
the file later on. The script looks like this:
#! usr/bin/perl
# guest.pl
print "What is your Name? ";
$name=<STDIN>; # get the response
# from the user
open (GUESTBOOK, ">>guest.pl");
# open a file with
# filehandle GUESTBOOK
print GUESTBOOK "$name"; # append
# the name to the guestbook file
chop($name); # remove the newline
print "Thank you, $name! Your name has been added to the Guestbook.\n";
close(GUESTBOOK);
So, what's going on here? We first print the prompt for the user.
The print command will default to <STDOUT>, in this
case, the screen.
We then use the <> construct to retrieve a line from a filehandle,
<STDIN>, which is the keyboard input up to a newline. We
next open a file called guest.pl, which is assigned a filehandle
of GUESTBOOK. The filehandle is used to reference the opened file
for reading or writing and it is usually all capitals. This filehandle,
like all variables or arrays in Perl, can be just about anything
you like, as long as it isn't a reserved word. You might notice
that we put >> in front of the filename. This means we are
opening the file for appending. The options for file opening are
in Table 3.1.
Table 3.1 File Opening Options
Operator | Action
|
> | Write
|
>> | Append
|
+> | Read and Write
|
Nothing | Read
|
Once the file is open, and we have a filehandle, we can put the
name into the guestbook file with a print command. Notice that
we do this by putting the filehandle directly after the print
statement. In fact, the normal print "string" command
is actually a short version of
print <STDOUT> "string";
We print the next line to the screen. We have put the variable
$name inside the string, and it will be printed as we entered
it. This is called variable interpolation; the variable is replaced
in the string before it is printed.
The chop() command called before the print removes the last character
in the string $name. This is used to remove the newline that is
appended to the string when it is entered from the keyboard. If
we didn't do this, we would end up with an extra newline printed
right after the name, and the exclamation point on a line by itself.
Not such a good idea, grammatically speaking. We left the newline
on when we printed it to the file because we wanted each name
on its own line.
Finally, we close the file and end the program. We don't actually
need to close the file, as Perl will close it automatically if
it is reopened or the program ends, but it is good programming
practice to put it there.
Okay, so now we have a database full of the names of our guests.
What next?
Well, the next logical step would be to check the file to see
if you have already visited so the database isn't filled up with
repeat customers. The procedure shown in Listing 3.1 shows you
how.
Listing 3.1 Checking the Database for Repeat Visitors
Checking the file chop($name); # remove the newline
print "Thank you, $name! Your name has been added to the Guestbook.\n";
close(GUESTBOOK);
print "What is your Name?";
$name=<STDIN>; # get the response
# from the user
open (GUESTBOOK, "guest.pl");
# open a file with filehandle
# GUESTBOOK
while ($line=<GUESTBOOK>) {
if ($line eq $name) {
print "Your name is already in the guestbook!\n";
close(GUESTBOOK);
exit;
}
}
close (GUESTBOOK); # close file
# for read
open (GUESTBOOK, ">>guest.pl");
# open same file for append
print GUESTBOOK "$name"; # append
# the name to the guestbook file
chop($name); # remove the newline
print "Thank you, $name! Your name has been added to the Guestbook.\n";
close(GUESTBOOK);
Let's have a look at the new part. The while loop will continue
until the condition is false. The condition in this case is assigning
the variable $line to each line of GUESTBOOK, terminated with
a newline.
This is the procedure we did when we got the name of the guest,
except we are now obtaining it from a file rather than <STDIN>.
This condition will continue to be true until there are no more
lines in the file. Notice that we opened the file guest.pl with
a read, so we can get the lines.
The if condition compares each $line with $name. We use the operator
eq here because the variables are strings. If they were numbers,
we would use a different set of comparison operators. Because
Perl makes no distinction between string variables and numeric
variables, we must be cautious as to which comparison operators
we use. Perl will compare "5" and "10" differently
depending on whether you use a string or numeric operator. "5"
will be less than "10" in a numeric sense, but greater
than "10" in a string context. Table 3.2 lists the possible
operators in both string and numeric context.
Table 3.2 Numeric versus String Operators
Numeric Operator |
String Operator | Definition
|
== | eq
| Equal to |
!= | ne
| Not equal to |
> | gt
| Greater than |
>= | ge
| Greater than or equal to |
< | lt
| Less than |
<= | le
| Less than or equal to |
<=> | cmp
| Not equal to, with numeric return |
A quick note on the compare operator: It will return a -1 if it
is less than, and a +1 if it is greater than.
The exit command within the if body will stop the program after
informing the user that his or her name is already in the guestbook.
We could have acomplished the same thing in one line with the
die command:
if ($line eq $name) {
die "Your name is already in the guestbook!\n";
}
This is useful for command line programs, but does not work very
well with CGI scripts, so it isn't used very often.
If the name isn't found in the GUESTBOOK file, we exit the while
loop, and continue on with the program. We have to close the file
and reopen it in append mode so that we can add the name to the
end of the file. Once again, strictly speaking, we don't have
to close the file before reopening it, but it is good practice
so that we develop a sense of what is happening, and when, in
our scripts. When dealing with larger programs it can get a little
confusing as to what is happening at a particular point in the
code.
Although we used ($line eq $name) in the above example, this is
not necessarily the best way to test equivalency. In this case,
the two strings must be exactly equal, which means "john"
and "John" are not equal. Also, if there happens to
be a newline in one string and not the other, then Perl will call
them differently. To get around some of these nuisances, we use
something called a regular expression.
Regular expressions in Perl come in very handy, as they are much
less cumbersome to use and a lot more flexible then string searching
and comparison in some other languages (which will remain nameless).
Let's look at an equivalent expression to the one used above:
if ($line =~ /john/) {
# do some stuff...
}
There are a few things going on here. First, we are using a new
operator.
The =~ operator is the regular expression operator. We will use
it a lot to do all types of searching and replacing. Here
we use it to compare $line with the expression between the /'s.
If the variable $line contains the string "john,"
then the condition is true. This means that the strings will match
if $line is equal to "john," "johnathan,"
"joe johnston," or "john and the beanstalk."
If we want to be sure that the first name begins with "john,"
we can change the expression slightly to yield
if ($line =~/^john/) {
The ^ character tells Perl that the variable $line must start
with "john" in order to match. This will match "john,"
"johnny," and "john smith" but not "big
john" or "joe johnston."
Ok, so what about case? The current expression doesn't do uppercase,
so "john" will match, but "John" will not.
Another simple change:
if ($line =~/^john/i) {
The "i" after the last / tells Perl to ignore the case
of the regular expression so that we now will match "John,"
"Johnathan," and "JOHN."
Of course, we don't want to match $line to just John, we want
to match it to the user input in the variable $name. Well, remember
variable interpolation, where a variable name in a string gets
substituted for its value before the string gets printed? The
same thing applies in regular expressions:
if ($line =~/^$name/i) {
Now the condition is true if $line is a string beginning with
$name in any combination of upper- and lowercase. Amazing!
A note on regular expressions: Some of the characters in a regular
expression are significant (like the ^) and will not become part
of the expression itself.
If you want to match these characters, you must "escape"
them. For instance, if you wanted to match the variable $line
to "caret^," you would precede the ^ by backslash in
the regular expression
if ($line =~/caret\^/) {
For a list of special characters in regular expressions, see Appendix
A.
You are also not limited to using the / as a delimeter for a regular
expression. If you precede the regular expression by m, the next
character becomes the delimeter, so this
if ($line =~/john/) {
is equivalent to
if ($line =~m#john#) {
and
if ($line =~m[john]) {
The delimiter can be just about any character, but notice that
in the case of character pairs, like the square brackets ([]),
the end delimiter is the opposite mate to the beginning delimiter.
Okay, now we have a database full of names, and we can check them
against inputed data, and ignore the case. What next? The next
logical step seems to be to glean some more useful information
from this program. Let's ask the user for their last name and
favorite color as well, as shown in Listing 3.2.
Listing 3.2 Asking for Additional User Information
Asking for details
print "What is your first name? ";
$name=<STDIN>; # get the response
# from the user
chop($name); # remove the newline
print "What is your last name? ";
$lastname=<STDIN>;
chop($lastname);
print "What is your favorite color? ";
$color=<STDIN>;
chop($color);
$newline=$name.':'.$lastname.':'.$color."\n"; # make line
# delimited with colons
open (GUESTBOOK, ">>guest.pl"); # Open file for append
print GUESTBOOK "$newline";
# Append the field line to
# the guestbook file
print "Thank you, $name! Your name has been added to the Guestbook.\n";
close(GUESTBOOK);
There are a few things different now. We are asking for three
seperate pieces of data, and assigning each to a variable. Notice
that we are removing the newline character as soon as we get the
data. The next thing we do is format a string of data with the
three fields separated by a colon, and with a newline character
tacked on the end. We'll end up with something like this:
John:Smith:magenta
This will make it easy to add or retrieve the data we want at
a later time. The "." operator in Perl is the append
operator. To form the string we want, we are appending a colon
to the end of $name, adding $lastname to the end of the resulting
string, appending another colon, adding $color, then finally appending
a newline. Confused yet? Don't worry: as with all things in Perl,
there is an easier way. That line is equivalent to
$newline=join(':',$name, $lastname, $color);
$newline.="\n";
The join() function joins the variables or strings listed into
one string, separating the fields with the specified delimiter
(a : in this case). The .= operator on the following line appends
the newline character to the end of the $newline variable. This
is equivalent to: $newline=$newline."\n";
Now that we have this information, we'll want to check it. We
do this using the split() command (Listing 3.3), the opposite
of the join() command. Surprise, surprise.
Listing 3.3 The Split Command
print "What is your first name? ";
$name=<STDIN>; # get the response
# from the user
chop($name); # remove the newline
print "What is your last name? ";
$lastname=<STDIN>;
chop($lastname);
print "What is your favorite color? ";
$color=<STDIN>;
chop($color);
$newline=$name.':'.$lastname.':'.$color."\n"; # make
# line delimited with colons
open (GUESTBOOK, "guest.pl");
while ($line=<GUESTBOOK>) {
($gbname, $gblastname, $gbcolor)=split(':', $line);
if ($gbname=~/^$name/i) {
print "You are already in the guestbook, $name!\n";
close (GUESTBOOK);
exit;
($gbname, $gblastname, $gbcolor)=split(':', $line);}
}
close (GUESTBOOK);
open (GUESTBOOK, ">>guest.pl");
# open file for appending
print GUESTBOOK "$newline";
# append the field line
# to the guestbook file
print "Thank you, $name! Your name has been added to the Guestbook.\n";
close(GUESTBOOK);
Here we assign $gbname, $gblastname, and $gbcolor to the first
three items retrieved by the split command. We do this by putting
brackets around the variable names to simulate an array. We could
have just as easily assigned all the variables to an array like
this:
@data=split(':', $line);
and referenced the first three elements in the array as
$data[Ø],
$data[1], and $data[2]
So now that we have some data to play with, let's do some more
tests just for practice. Our program is getting a little long,
so in Listing 3.4 we'll only deal with the part that has changed.
Listing 3.4 Testing the Data
while ($line=<GUESTBOOK>) {
($gbname, $gblastname, $gbcolor)=split(':', $line);
if (($gbname=~/^$name/i) && ($gblastname=~/^$lastname/i)) {
print "You are already in the guestbook, $name!\n";
close (GUESTBOOK);
if ($gbcolor!~/$color/i) {
print "You have a different favorite color!\n";
print "Your old favorite color is: $gbcolor\n";
print "Your new favorite color is: $color\n";
print "Would you like to change it? ";
$input=<STDIN>;
if ($input=~/^y/i) {
open(GUESTBOOK, "guest.pl");
undef $/;
$body=<GUESTBOOK>;
$/="\n";
close(GUESTBOOK);
$body=~s/$line/$newline/;
open(GUESTBOOK, ">guest.pl");
print GUESTBOOK $body;
close(GUESTBOOK);
exit;
}
else {
exit;
}
}
exit;
}
}
What's happening here? The first thing you may notice is that
we are doing an extra test at the first if statement. Since people
may tend to have similar first names, we are now testing that
the first and last name match.
The && means that the first and second expressions
must be true in order for the if statement to be true. Alternatively,
|| means the first or the second expression must be true.
We next check to see if the color is the same. If it is, we just
exit. If it isn't, we alert the user, and ask them if they want
to change their color choice. We get a line from STDIN as usual,
and check to see if it starts with y or Y. If it doesn't, we exit.
If it does, that's when the fun starts.
We open guest.pl to read, as normal, but then we undef (undefine)
a system variable $/. This variable is the one used to determine
where lines end when you read them in from a file. It is normally
set to "\n", so you get one line per line in the file.
By undefining it, we will now read the entire file (newlines and
all) into the variable $body. Once we have the whole thing, we
can replace the line with the old color ($line) with the line
with the new color ($newline). This is done by using the =~ operator
again, but notice that there is an s in front of the first /.
This means we are doing a substitution. The expression between
the first two /'s will be replaced by the expression between the
second two /'s, if it exists. As with all regular expressions,
you can use any delimiter you like, so
$body =~ s #$line#newline#;
would have been equivalent. Also, the i directive to ignore case
that comes after the last slash can apply here as well. Once we
have replaced the line we want, we open the guestbook again for
writing, and just write the whole file out with a print, and exit.
Remember to redefine $/ before you do any more file operations
to make sure that you don't mess up your future operations. But
back to our regular expressions.
It's probably a good idea to go at each of the seperate elements
covered with this script so there is no doubt as to what regular
expression operators are, and how they work.
Unlike the grep command, which looks at all the lines in the designated
file, this script only looks at one, the line which is in $_.
To include all the lines of a file we need to do this:
while (<>) {
if (/crypt/) {
print "$_";
}
}
This loop continues until all lines are checked.
Now say you are checking your own scripts for crypt, and you realize
that your typing was a little sloppy in places. Sometimes you
slipped and spelled crypt with two p's, as cryppt. You can amend
your searching script to
while (<>) {
if (/cryp*t/) {
print "$_";
}
}
The asterisk will allow a search and return of crypt, as well
as any spellings of crypt with two or more p's.
Once you have matched what you are looking for you might want
to replace it with something. To do this, we can use the substitute
operator.
You might use this operator if you want to replace one string
with another string. The substitute operator has a short-form,
s, which looks like this in a statement:
s/crypt/tomb/;
The substitute operator will replace crypt with the replacement
string tomb.
Regular expressions, as you can now see, are patterns. These patterns
can be as big or as small as you need, each with its own peculiarities.
Let's look at some more.
There are various patterns that regular expressions work with:
single-character, grouping, and anchoring. Each of these has its
own little characteristics that make it work.
The Single-Character Pattern
The most common pattern-matching character is a single character
used to match itself. This would be using a letter as a regular
expression to match itself; in other words, regular expression
"a" looking for character "a" in a string.
The second most common pattern-matching character is a period
or dot, "." This character will match any single character
with the exception of the newline operator, /n.
Moving into larger areas, a character class pattern-matching can
occur when a set of square brackets are used to enclose the regular
expression in question:
/[crypt]/
When a character class is used, a match will occur if any character
in the regular expression is found in the strings being tested.
It is important to note that regular expressions try to be as
accurate as possible, without limiting their scope, so they are
very case-sensitive.
One the other hand, only one of the characters in the correct
corresponding postion has to be in the regular expression for
a match to occur.
You can designate a range with this operator by inserting a dash
between the values. For example,
/[Ø-5]/
is the same as
/[Ø12345]/
which can be very powerful if you consider that
/[a-zA-ZØ-9]/
can search for all letters of the alphabet-both upper- and lowercase-as
well as all numbers. Not bad for 15 little keystrokes.
If you want to use the character class in the opposite way-for
example, to return those matches which are not in the regular
expression-then place a caret (^) after the left bracket, like
so:
/[^Ø-5]/
This expression matches every single character which is not in
the range from 0 to 5. There are some common character classes
in Perl which are listed in Table 3.3.
Table 3.3 Character Class Contractions
Construct | Equivalent Class
| Negated Construct | Equivalent Negated Class
|
\d (digits) | [0-9] | \D (anything but digits)
| [^0-9] |
\w (words) | [a-zA-Z0-9] |
\W (anything but words) | [^a-zA-Z0-9]
|
\s (space) | [ \r\t\n\f] |
\S (anything but space) | [^ \r\t\n\f]
|
Before we get into any more of the guts of Perl, let's apply what
we've already exposed ourselves to. We should also start to make
note of the little differences between UNIX Perl and Perl for
Windows NT, or WinPerl.
One big difference is that while most Perl scripts you will find
contain the first line
#! user/local/bin/perl
or something similar, this is unnecessary with WinPerl. This line
in UNIX lets the operating system know where to find the Perl
interpreter. With Windows NT, you need to associate the .pl file
extension with perl.exe for your script to function. Probably
associating the .cgi extension is a good idea, too, since most
of these files are also written in Perl.
The Grouping Pattern
There are several grouping patterns to understand: sequence, multipliers,
parentheses, and alternation. By using grouping patterns you can
give your script the ability to put conditions on your regular
expression matching. For example, look for six of this, or look
for two or more of these.
The Sequence Grouping Pattern
We're already familiar with this: It's where a regular expression
matches a string exactly, like
/crypt/
where the regular expression looks for the same sequence of the
characters: c r y p t.
The Multipliers Grouping Pattern We already
met one of these with the asterisk. The asterisk designates a
"zero or more" match with the previous character. The
"+" symbol is used to designate the return of matches
containing one or more of the previous character. To indicate
a match of "zero or one" of the previous character,
you would use the question mark, "?." Each of these
grouping patterns will choose to match the larger string of those
strings it finds.
If you want to stipulate how many characters these grouping patterns
are to match, you can use a general multiplier, whose format is
/a{2,4}/
where a is the regular expression we are trying to match, and
2 and 4 are the range of a's which will satisfy our string match,
meaning that a match will be found for strings "aa,"
"aaa," and "aaaa," but not for strings "a"
or "aaaaa."
When the general modifier has the second number absent, as with
/a{3,}/
it tells the match to look for three or more of the the letter
a. If the comma is absent, as with
/a{3}/
it tells the match to find exactly three a's. To look for three
or fewer a's, a zero is used in the range field, like this:
/a{Ø,3}/
If you want to match the conditions of two characters you might
try
/a.{3}x/
which will make the regular expression look for any letter a separated
by three non-newline characters from the letter x.
The Parentheses Grouping Pattern You can use
a pair of open and close parentheses to enclose any part of an
expression match you need to have remembered. The part of the
expression that is held by the parentheses is the part of the
expression that will be kept in memory.
To use this remembered expression match, you use an integer and
a backslash, like this:
/moose(.)kiss\1/;
This regular expression will match any occurrence of the string
"moose," followed by any two non-newline characters,
followed by the string "kiss," followed by any one non-newline
character. The regular expression will remember which single non-newline
characters it matched with "moose" and look for the
same with "kiss." For example,
mooseqkissq
is a match, but
mooseqkissw
is not. This differs from the regular expression
/moose.kiss./;
which will match any two non-newline characters, whether they
are the same or not. The "1" between the slashes relates
to what's in the parentheses. If there is more than one set of
parentheses, you can use the number between the slashes to indicate
the one you want remembered, starting from left to right. An example
might look like this:
/a(.)p(.)e\1s/;
The first character is "a," followed by the #1 non-newline
character, followed by "p," followed by the #2 newline
character, followed by "e," followed by whatever the
#1 non-newline character is, followed by "s." This will
match
aqpdeqs
where the different non-newline characters only have to match
their designation, and not each other. To add the ability to match
more than a single character with the referenced part, just add
an asterisk to the expression, as
/a(.*)p\1e/;
This expression would match "a," followed by any number
of non-newline characters, followed by "p," followed
by that same series of non-newline characters and then "e."
A match might be
aplanetpplanete
but not
aqqpqqqe
You can also use the memory grouping pattern to replace portions
of a string. A string like
$_ = "a peas p corn e squash";
s/p(.*)e/b\1c/;
creates the new string value of
a peas b corn c squash
where the "p" and "e" were replaced with "b"
and "c," but what was in between remains unchanged.
The Alternation Grouping Pattern The general
format for alternation is
a|p|e
where the regular expression is asked to match only one of the
designated alternatives, "a," "p," or "e."
You can apply alternation to more than one character, so
ape|gorilla|monkey
would be equally valid.
The Anchoring Pattern
To anchor a pattern there are four special notations available.
You would want to anchor your regular expression search if you
don't want to turn up every instance of a string. For example,
when searching for the string "the," you don't want
to also get "then," "there," "their,"
or "them." To do this you might use the word boundry
anchor \b:
/the\b/;
so that only those strings ending with "the" are matched.
But this doesn't stop a string like "absinthe" from
being matched, so you can add a word boundary anchor to the front
of the regular expression
/\bthe\b/;
so that only the exact matches of "the" are returned.
If, on the other hand, you wanted to match only those instances
which included the string in the regular expression, and not the
string itself, you would use the \B anchor
/the\B/;
to return the matches "thee," "these," "absinthe,"
"there," and "then," but not "the."
The next anchor, \^, is used to match the start of a string only
when it is in a place that makes sense to match, as with
/\^the/;
which matches only those strings which start with "the."
The final anchor, \$, works in a similar way but on the end of
a string, so
/the\$/;
will match any occurrence of "the" which appears at
the end of a string.
Pattern Precedence
As with operators, both grouping and anchoring patterns have an
order of precedence to follow. Table 3.4 gives you a quick rundown.
Table 3.4 Pattern Precedence from Highest to Lowest
Name | Representation
|
Parentheses | () |
Mulipliers | +*?{a,b} |
Sequence and Anchoring | ape\b\B\^\$
|
Alternation | | |
Remember, if you use parentheses to clarify a regular expression
because it has the higest precedence, you will also be employing
its memory of that string. These examples should explain the differences
in matches caused by the use of parentheses.
ape*
will match ap, ape, apee, apeee, etc.
whereas
(ape)
will match "", ape, apeape, apeapeape, etc.
and
\^a|b
will match "a" at the start of the line, or "b"
anywhere in the line. Yet
\^(a|b)
will match either "a" or "b" at the start
of the line, and
a|pe|s
matches "a" or "pe" or "s." If you
apply parentheses
(a|pe)(pe|s)
you'll match ape, as, pepe, and pes. These parentheses can be
used to find related words like
(soft|hard)wood
where either instances of softwood or hardwood are returned as
matches.
A possible use for the matching operators might be a script that
looks for a common response to direct a response. You can use
the "=~" operator to do this. If you remember, this
operator places the object of the expression as the new value.
Say you have already filled $_ with a value you need later in
the script. Then you could use =~ to make a temporary change of
direction. The =~ operator acts like this:
print "Will you be needing anything else?";
if (<STDIN> =~ /^[Yy]/) { # which creates the
# condition that if the input begins with a 'Y'
# or 'y' that the condition is found true, so
# we proceed to the next line
print "And what would that be?";
<STDIN>;
print "I'm sorry, that's just not possible.";
}
where no matter what the user inputs, the response will be the
same.
There are some other ways to modify your regular expressions.
Perl uses the "I" symbol to tell a regular expression
to ignore case in matching. In the format
/string_characters/i
you could amend a line from our last example script from this:
if (<STDIN> =~ /^[Yy]/)
to this:
if (<STDIN> =~ /^y/i)
so that the case of the response is not a factor determining response.
If you need to use a regular expression to search through filepaths
you would need to include slashes in the expression, and in order
to do this, a slash has to be preceded by a backslash to appear
only as a character in the string
/^\/usr\/bin\/perl/
(and the regular expression starts to look like a divoted golf
course!)
In this chapter we started out discussing various Perl control
structures like the statement block used to define a specific
script action, and the different kinds of loops, like the if/unless
loop and the for/foreach loop. These loops can be used to have
Perl repeat an action as many times as necessary for the script's
operation.
We also covered associative arrays, demonstrating how they differ
from arrays by having not just a single value in each element,
but a key/value pair. Associative arrays are modified by different
operators-like the keys, values, each, and delete operators.
The chapter finished with defining regular expressions as a pattern
matching tool used by Perl. Now that you have a general understanding
of what regular expressions are, defining them between two slashes,
and how they match these definition patterns to script specified
data, you can start solving some more interesting tasks with Perl.
In the next chapter, we'll marry this guestbook script to a CGI
output for the user and look at how Perl interacts with HTML.