Brought to you by EarthWeb
ITKnowledge Logo Login Graphic Click Here!
Click Here!
ITKnowledge
Find:
 
EXPERT SEARCH ----- nav

EarthWeb Direct

EarthWeb sites: other sites

Previous Table of Contents Next


ASCII

Unicode is based on two character sets that predate it: ASCII and ISO Latin-1. ASCII is a 7-bit character set with 128 different characters. ASCII was designed for communication in United States English. It therefore contains the lowercase letters a-z, the capital letters A-Z, the digits 0-9, various punctuation marks, and a number of non-printing control characters, many of which are closely related to the types of terminals and printers that were in use when ASCII was invented. The characters in ASCII are numbered from 0 to 127. Character 0 is the non-printing null character. Character 127 is the delete character. Characters 48 through 57 are the digits 0 through 9. Characters 65 through 90 are the capital letters A through Z. Characters 97 through 122 are the lowercase letters a through z. The remaining ASCII characters are various punctuation marks and non-printing characters. Table 2-3 is a complete list.

Table 2-3 The ASCII character set

Code Character Code Character Code Character Code Character

0 null 32 space 64 @ 96 `
1 soh 33 ! 65 A 97 a
2 stx 34 " 66 B 98 b
3 etx 35 # 67 C 99 c
4 eot 36 $ 68 D 100 d
5 enq 37 % 69 E 101 e
6 ack 38 & 70 F 102 f
7 bell 39 ' 71 G 103 g
8 backspace 40 ( 72 H 104 h
9 tab (\t) 41 ) 73 I 105 i
10 linefeed (\n) 42 * 74 J 106 j
11 vertical tab 43 + 75 K 107 k
12 formfeed (\f) 44 , 76 L 108 l
13 carriage return, (\r) 45 - 77 M 109 m
14 so 46 . 78 N 110 n
15 si 47 / 79 O 111 o
16 dle 48 0 80 P 112 p
17 dc1 49 1 81 Q 113 q
18 dc2 50 2 82 R 114 r
19 dc3 51 3 83 S 115 s
20 dc4 52 4 84 T 116 t
21 nak 53 5 85 U 117 u
22 syn 54 6 86 V 118 v
23 etb 55 7 87 W 119 w
24 can 56 8 88 X 120 x
25 em 57 9 89 Y 121 y
26 sub 58 : 90 Z 122 z
27 escape 59 ; 91 [ 123 {
28 is4 60 < 92 \ 124 |
29 is3 61 = 93 ] 125 }
30 is2 62 > 94 ^ 126 ~
31 is1 63 ? 95 _ 127 delete

ISO Latin-1

As I said, ASCII is designed to handle U.S. English. It can do a reasonable approximation of other dialects of English, but it begins to have problems with many other European languages, like French and German. There are no cedillas, umlauts, or any of the other characters not used in English, but present in these languages.

The first bit of each ASCII character is 0. You can define another 128 characters by using the bytes whose first bit is one. Indeed, this is the scheme used in most modern computers. The characters with numeric values between 128 and 255 are used to encode the additional characters needed by most languages that are written in some approximation of the Latin alphabet. There are at least two common ways ASCII is extended into the upper 128 characters. The one around which Unicode and Java are built is the ISO 8859-1 Latin-1 character set, often just referred to as ISO Latin-1. Table 2-4 lists the upper 128 characters of the ISO Latin-1 character set. The lower 128 characters are exactly the same as they are for ASCII.


Previous Table of Contents Next
HomeAbout UsSearchSubscribeAdvertising InfoContact UsFAQs
Use of this site is subject to certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.