Click Here!
home account info subscribe login search My ITKnowledge FAQ/help site map contact us


 
Brief Full
 Advanced
      Search
 Search Tips
To access the contents, click the chapter and section titles.

Perl CGl Programming: No experience required.
(Publisher: Sybex, Inc.)
Author(s): Erik Strom
ISBN: 0782121578
Publication Date: 11/01/97

Bookmark It

Search this book:
 
Previous Table of Contents Next


“Printable” Characters

Like most other Internet protocols, URLs were originally designed to ensure that they could be sent via e-mail. Most older mail systems were capable of recognizing only 7-bit characters, so the characters used in a URL must con- form to that.

However, even some of these characters have a special meaning in a URL. For example, the ampersand (&) is used to separate the parameters in the query string. But you will encounter many occasions when you have to send ampersands and plus signs and equal signs and even 8-bit, non-ASCII characters in a URL. How can it be done?

The solution in the URL scheme of things is to encode these special characters in the form

   %nn

where the percent sign (%) indicates that the next two characters are the hexadecimal value of the actual, encoded character. A good example of this is the question mark (?) that begins the query string in our example:

   perl.bat?LastName=Jones&FirstName=John&Address=123+Any+Street …

Again, this character has a special meaning in the URL because it indicates that perl.bat should be run with the arguments that follow it. If a literal question mark is included in any of the arguments, it is encoded as

   %26

because 26 is the hexadecimal code for a question mark in the ASCII table.

Table 5.2 shows the other printable ASCII characters that have a special meaning in a URL and therefore will be encoded by the browser.

Table 5.2: Printable Characters Encoded in URLs

Character “Hex” Value

Tab 09
Space 20
" 22
< 3C
> 3E
[ 5B
\ 5C
] 5D
^ 5E
` 60
{ 7B
| 7C
} 7D
7E

Any control characters that wind up in a URL will be encoded, too.

Because you, as the CGI programmer, are sitting at the other end of this scheme, you don’t have to deal with encoding characters. The rule for you will be simple: Any time you encounter a percent sign in a query string, you may assume that the next two characters are the hexadecimal code of the character that is really intended to be there.


Tip:  If you’re worried about getting literal percent signs in a URL, don’t be. They will be encoded too.

You don’t have to be too concerned with the actual ASCII values of the characters, although every programmer usually has an ASCII table handy for reference. Perl has a number of handy tricks for turning hexadecimal values into characters, as you’ll soon see.

What you’ll have to do at your end is recognize an encoded character, strip off the percent sign, and send the remaining number to a Perl function that will translate it for you.

Hexadecimal Numbering: A Little Math Lesson

The hexadecimal, or base-16, number system is meat and potatoes to people who program for a living. This is primarily because it is a convenient way to represent the binary, or base-2, numbers that are meat and potatoes to computers.

Computers deal with data in bits, or 1s and 0s, that indicate an on or off state. So the binary numbering system is especially important, because binary numbers are the only kind that computers can process at the lowest level. However, this is a system in which there are only two allowable digits: 1 and 0. The decimal number 5 in this system would be 101 because it consists of 1 of 20 (1), 0 of 21 (0), and 1 of 22 (4): 1 + 0 + 4 = 5.

In the hexadecimal numbering system, “hex” for short, there are 16 allowable digits. In decimal, 0 through 9 are the numbers you would expect. The decimal numbers 10 through 15 are represented by the characters A through F. The hex number FF would be 255 in decimal because it consists of F of 160(15) and F of 161 (16 x 15, or 240): 15 + 240 = 255.

Because of its binary, on-off architecture, everything on a computer at some point boils down to a power of two. When you hear technicians talking about a 32-bit microprocessor, which is what powers most PCs these days, they are referring to a processor that handles data in chunks of 32 bits. That’s a maximum number of one less than 232, or a binary number consisting of 32 1s:

       11111111111111111111111111111111

This number is not any less intimidating in decimal: 4,294,967,295.

The beauty of hex numbering is that each digit represents exactly four binary bits. You can’t say that about decimal, where 32 bits comes out to that awful 4-billion-something. Broken down to 4 bits per digit, the hex value of 232 – 1 is rather elegant:

       1111 1111 1111 1111 1111 1111 1111 1111
         F    F    F    F    F    F    F    F

Plus, FFFFFFFF certainly is easier to keep track of than a number consisting of 32 ones.


Previous Table of Contents Next


Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc.
All rights reserved. Reproduction whole or in part in any form or medium without express written permission of EarthWeb is prohibited.