Click Here!
home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Perl CGl Programming: No experience required.
(Publisher: Sybex, Inc.)
Author(s): Erik Strom
ISBN: 0782121578
Publication Date: 11/01/97

Bookmark It

Search this book:
 
Previous Table of Contents Next


The Power of Regular Expressions

Change the foreach loop in geturl.pl to read this way:

   # Split, decode each of the name-value pairs and print them on the page.

       foreach $NameValue (@NameValuePairs)
           {
           ($Name, $Value) = split (/=/, $NameValue);
           $Value =∼ tr/+/ /;
           $Value =∼ s/%([\dA-Fa-f][\dA-Fa-f])/ pack ("C", hex ($1))/eg;
           print "Name = $Name, value = $Value<BR>\n";
           }

If you’ve never worked with regular expressions, the two new lines in geturl.pl probably are among the weirdest things you’ve ever seen. However, install the new program, fill out form1.html in your Web browser, and submit it, using all the bizarre characters you want. Your result will be what you typed in, as illustrated in Figure 5.9.


Figure 5.9:  All of the encoded characters are translated

Where did all the garbage go?

If you had any doubts about the power of Perl, this little trick should dispel them. If you have been intimidated by the perplexing and strange conventions of regular expressions, you now should feel inspired to learn them. You have accomplished in two lines of code what probably would have taken an entire program to do in C or C++. In Perl, you just have to learn the lingo.

They’re easy to avoid, these Perl regular expressions. They generally look like something only Martians would understand.

However, you now have seen what you can do with a simple, though quite turgid, regular expression. You simply have no choice but to learn more about them!

Let’s begin by examining the two new lines in geturl.pl.

Translations, Substitutions

The first new line is fairly simple, though utterly meaningless to the untrained eye:

   $Value =∼ tr/+/ /;

A couple of new Perl concepts surface in this line:

  In the expression "$String =∼ /PATTERN/", the =∼ operator, the match operator, which is true if, $String contains /PATTERN/.
  tr, the translate function, which turns all characters found between the first two forward-slash characters following it into the characters between the second two slashes.

You will use the =∼ operator frequently—always, in fact, when you want to change characters in a string into other characters. The specification for tr is

   tr /SEARCH_LIST/REPLACE_LIST/

where SEARCH_LIST is the characters for which you want to search, and REPLACE_LIST is what their new values will be.


Note:  There are three optional parameters to tr that go after the last slash: c, d, and s. We don’t need them at this point, so we won’t discuss them. (For the particular among you readers, they stand for complement, delete, and squeeze.)

The line from geturl.pl that utilizes tr:

   $Value =∼ tr/+/ /;

has the + character as its SEARCH_LIST and a space as its REPLACE_LIST. Therefore, it will go through the $Value string and replace every occurrence of the plus character (+) with a space. This is handy in URLs, where all spaces are designated by plus signs.

The second new line in geturl.pl is trickier to understand.

   $Value =∼ s/%([\dA-Fa-f][\dA-Fa-f])/ pack ("C", hex ($1))/eg;

It’s a little easier to understand if it is explained in the sequence of events that it kicks off.

First of all, what does this code do? This is the program line that turns URL-encoded characters back into printable characters. Remember the %nn convention, in which special characters are encoded with a percent sign followed by their hexadecimal ASCII values? This is where the encoded values revert to real characters. Let’s step through the program line:

  s is the Perl substitute function. Like tr, it takes everything it finds in $Value that matches the string between the first two forward slashes and replaces it with what is between the second two slashes.
  In this example, s has been told to look for % followed by two characters that are either digits, designated by \d, or the characters A through F (or a through f) which are the valid hexadecimal numbers.
  The expression pack ("C", hex ($1)), which is specified as the REPLACE_LIST for s, is best understood if it is taken apart from the inside out. hex is a Perl function that expects its argument to be a hexadecimal number and it returns a decimal number. $1 is the value found by the expression in the first set of parentheses in SEARCH_LIST, minus the percent sign. pack is a function that takes its second argument and “packs” it into a binary value or structure based on the template that is its first argument. In our example, the template is C, which tells pack to stuff the value in the second argument into a character.
  The e at the end of the line indicates to s that REPLACE_LIST is an expression rather than a string. Without it, every %nn string in $Value would be literally replaced with pack ("C", hex ($1)). e tells s to do the replacement with the result of the expression.
  The g following e tells s to do a global substitution; in other words, replace every instance of SEARCH_LIST in $Value with what is calcu- lated in REPLACE_LIST. If you left this parameter off, s would do the operation on the first occurrence and then quit.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited.