
|
 |
|
Michael Abrash's Graphics Programming Black Book, Special Edition
by Michael Abrash
The Coriolis Group
ISBN: 1576101746 Pub Date: 07/01/97
|
|
Listing
| Borland
| Microsoft
| Borland
| Microsoft
| Assembly
| Optimization Ratio
|
|
| (no opt)
| (no opt)
| (opt)
| (opt)
|
1
| 166.9
| 166.8
| 167.0
| 165.8
| 155.1
| 1.08
|
4
| 13.5
| 13.6
| 13.5
| 13.5
| ...
| 1.01
|
5
| 4.7
| 5.5
| 3.8
| 3.4
| 2.7
| 2.04
|
Ratio best designed to worst designed
| 35.51
| 30.33
| 43.95
| 48.76
| 57.44
|
Note: The execution times (in seconds) for this chapters listings were timed when the compiled listings were run on the WordPerfect 4.2 thesaurus file TH.WP (362,293 bytes in size), as compiled in the small model with Borland and Microsoft compilers with optimization on (opt) and off (no opt). All times were measured with Paradigm Systems TIMER program on a 10 MHz 1-wait-state AT clone with a 28-ms hard disk, with disk caching turned off.
|
|
Table 1.1 Execution Times for WordPerfect Checksum.
|
|
LISTING 1.2 L1-2.C
/*
* Program to calculate the 16-bit checksum of the stream of bytes
* from the specified file. Obtains the bytes one at a time in
* assembler, via direct calls to DOS.
*/
#include <stdio.h>
#include <fcntl.h>
main(int argc, char *argv[]) {
int Handle;
unsigned char Byte;
unsigned int Checksum;
int ReadLength;
if ( argc != 2 ) {
printf(usage: checksum filename\n);
exit(1);
}
if ( (Handle = open(argv[1], O_RDONLY | O_BINARY)) == -1 ) {
printf(Cant open file: %s\n, argv[1]);
exit(1);
}
if ( !ChecksumFile(Handle, &Checksum) ) {
printf(Error reading file %s\n, argv[1]);
exit(1);
}
/* Report the result */
printf(The checksum is: %u\n, Checksum);
exit(0);
}
LISTING 1.3 L1-3.ASM
; Assembler subroutine to perform a 16-bit checksum on the file
; opened on the passed-in handle. Stores the result in the
; passed-in checksum variable. Returns 1 for success, 0 for error.
;
; Call as:
; int ChecksumFile(unsigned int Handle, unsigned int *Checksum);
;
; where:
; Handle = handle # under which file to checksum is open
; Checksum = pointer to unsigned int variable checksum is
; to be stored in
;
; Parameter structure:
;
Parms struc
dw ? ;pushed BP
dw ? ;return address
Handle dw ?
Checksum dw ?
Parms ends
;
.model small
.data
TempWord label word
TempByte db ? ;each byte read by DOS will be stored here
db 0 ;high byte of TempWord is always 0
;for 16-bit adds
;
.code
public _ChecksumFile
_ChecksumFile proc near
push bp
mov bp,sp
push si ;save Cs register variable
;
mov bx,[bp+Handle] ;get file handle
sub si,si ;zero the checksum ;accumulator
mov cx,1 ;request one byte on each ;read
mov dx,offset TempByte ;point DX to the byte in
;which DOS should store
;each byte read
ChecksumLoop:
mov ah,3fh ;DOS read file function #
int 21h ;read the byte
jcErrorEnd;an error occurred
and ax,ax ;any bytes read?
jz Success ;no-end of file reached-were done
add si,[TempWord] ;add the byte into the
;checksum total
jmpChecksumLoop
ErrorEnd:
sub ax,ax ;error
jmp short Done
Success:
mov bx,[bp+Checksum] ;point to the checksum variable
mov [bx],si ;save the new checksum
mov ax,1 ;success
;
Done:
pop si ;restore Cs register variable
pop bp
ret
_ChecksumFileendp
end
The lesson is clear: Optimization makes code faster, but without proper design, optimization just creates fast slow code.
Well, then, how are we going to improve our design? Before we can do that, we have to understand whats wrong with the current design.
Know the Territory
Just why is Listing 1.1 so slow? In a word: overhead. The C library implements the read() function by calling DOS to read the desired number of bytes. (I figured this out by watching the code execute with a debugger, but you can buy library source code from both Microsoft and Borland.) That means that Listing 1.1 (and Listing 1.3 as well) executes one DOS function per byte processedand DOS functions, especially this one, come with a lot of overhead.
For starters, DOS functions are invoked with interrupts, and interrupts are among the slowest instructions of the x86 family CPUs. Then, DOS has to set up internally and branch to the desired function, expending more cycles in the process. Finally, DOS has to search its own buffers to see if the desired byte has already been read, read it from the disk if not, store the byte in the specified location, and return. All of that takes a long timefar, far longer than the rest of the main loop in Listing 1.1. In short, Listing 1.1 spends virtually all of its time executing read(), and most of that time is spent somewhere down in DOS.
You can verify this for yourself by watching the code with a debugger or using a code profiler, but take my word for it: Theres a great deal of overhead to DOS calls, and thats whats draining the life out of Listing 1.1.
How can we speed up Listing 1.1? It should be clear that we must somehow avoid invoking DOS for every byte in the file, and that means reading more than one byte at a time, then buffering the data and parceling it out for examination one byte at a time. By gosh, thats a description of Cs stream I/O feature, whereby C reads files in chunks and buffers the bytes internally, doling them out to the application as needed by reading them from memory rather than calling DOS. Lets try using stream I/O and see what happens.
Listing 1.4 is similar to Listing 1.1, but uses fopen() and getc() (rather than open() and read()) to access the file being checksummed. The results confirm our theories splendidly, and validate our new design. As shown in Table 1.1, Listing 1.4 runs more than an order of magnitude faster than even the assembly version of Listing 1.1, even though Listing 1.1 and Listing 1.4 look almost the same. To the casual observer, read() and getc() would seem slightly different but pretty much interchangeable, and yet in this application the performance difference between the two is about the same as that between a 4.77 MHz PC and a 16 MHz 386.
|