
|
 |
|
Michael Abrash's Graphics Programming Black Book, Special Edition
by Michael Abrash
The Coriolis Group
ISBN: 1576101746 Pub Date: 07/01/97
|
Chapter 22 Zenning and the Flexible Mind
Taking a Spin through What Youve Learned
And so we come to the end of our journey; for now, at least. What follows is a modest bit of optimization, one which originally served to show readers of Zen of Assembly Language that they had learned more than just bits and pieces of knowledge; that they had also begun to learn how to apply the flexible mindunconventional, broadly integrative thinkingto approaching high-level optimization at the algorithmic and program design levels. You, of course, need no such reassurance, having just spent 21 chapters learning about the flexible mind in many guises, but I think youll find this example instructive nonetheless. Try to stay ahead as the level of optimization rises from instruction elimination to instruction substitution to more creative solutions that involve broader understanding and redesign. Well start out by compacting individual instructions and bits of code, but by the end well come up with a solution that involves the very structure of the subroutine, with each instruction carefully integrated into a remarkably compact whole. Its a neat example of how optimization operates at many levels, some much less determininstic than othersand besides, its just plain fun.
Enjoy!
Zenning
In Jeff Duntemanns excellent book Borland Pascal From Square One (Random House, 1993), theres a small assembly subroutine thats designed to be called from a Turbo Pascal program in order to fill the screen or a systemscreen buffer with a specified character/attribute pair in text mode. This subroutine involves only 21 instructions and works perfectly well; however, with what we know, we can compact the subroutine tremendously and speed it up a bit as well. To coin a verb, we can Zen this already-tight assembly code to an astonishing degree. In the process, I hope youll get a feel for how advanced your assembly skills have become.
Jeffs original code follows as Listing 22.1 (with some text converted to lowercase in order to match the style of this book), but the comments are mine.
LISTING 22.1 L22-1.ASM
OnStack struc ;data thats stored on the stack after PUSH BP
OldBP dw ? ;callers BP
RetAddr dw ? ;return address
Filler dw ? ;character to fill the buffer with
Attrib dw ? ;attribute to fill the buffer with
BufSize dw ? ;number of character/attribute pairs to fill
BufOfs dw ? ;buffer offset
BufSeg dw ? ;buffer segment
EndMrk db ? ;marker for the end of the stack frame
OnStack ends
;
ClearS proc near
push bp ;save callers BP
mov bp,sp ;point to stack frame
cmp word ptr [bp].BufSeg,0 ;skip the fill if a null
jne Start ; pointer is passed
cmp word ptr [bp].BufOfs,0
je Bye
Start: cld ;make STOSW count up
mov ax,[bp].Attrib ;load AX with attribute parameter
and ax,0ff00h ;prepare for merging with fill char
mov bx,[bp].Filler ;load BX with fill char
and bx,0ffh ;prepare for merging with attribute
or ax,bx ;combine attribute and fill char
mov bx,[bp].BufOfs ;load DI with target buffer offset
mov di,bx
mov bx,[bp].BufSeg ;load ES with target buffer segment
mov es,bx
mov cx,[bp].BufSize ;load CX with buffer size
rep stosw ;fill the buffer
Bye:mov sp,bp ;restore original stack pointer
pop bp ; and callers BP
ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack
ClearS endp
The first thing youll notice about Listing 22.1 is that ClearS uses a REP STOSW instruction. That means that were not going to improve performance by any great amount, no matter how clever we are. While we can eliminate some cycles, the bulk of the work in ClearS is done by that one repeated string instruction, and theres no way to improve on that.
Does that mean that Listing 22.1 is as good as it can be? Hardly. While the speed of ClearS is very good, theres another side to the optimization equation: size. The whole of ClearS is 52 bytes long as it standsbut, as well see, that size is hardly set in stone.
Where do we begin with ClearS? For starters, theres an instruction in there that serves no earthly purposeMOV SP,BP. SP is guaranteed to be equal to BP at that point anyway, so why reload it with the same value? Removing that instruction saves us two bytes.
Well, that was certainly easy enough! Were not going to find any more totally non-functional instructions in ClearS, however, so lets get on to some serious optimizing. Well look first for cases where we know of better instructions for particular tasks than those that were chosen. For example, theres no need to load any register, whether segment or general, through BX; we can eliminate two instructions by loading ES and DI directly as shown in Listing 22.2.
|