
|
 |
|
Michael Abrash's Graphics Programming Black Book, Special Edition
by Michael Abrash
The Coriolis Group
ISBN: 1576101746 Pub Date: 07/01/97
|
The large model is actually not necessary for the 96×96 cellmap in Listing 17.5. However, I was actually more interested in seeing a fast 200×200 cellmap, and two 200×200 cellmaps cant fit in a single segment. (This can easily be worked around in assembly language for cellmaps up to a segment in size; beyond that size, cellmap scanning becomes pretty complex, although it can still be efficiently implemented with some clever programming.)
Anyway, using the large model helps illustrate that its the data representation and the data processing approach you choose that matter most. Optimization details like memory models and segments and in-line functions and assembly language are important but secondary. Let your mind roam creatively before you start coding. Otherwise, you may find youre writing well-tuned slow code, which is by no means the same thing as fast code.
Take a close look at Listing 17.5. You will see that its quite a bit simpler than Listing 17.4. To some extent, thats because I decided to hard-wire the program to wrap around from one edge of the cellmap to the other (its much more interesting that way), but the main reason is that its a lot easier to work with the neighbor-count model. Theres no complex mask and pointer management, and the only thing that really needs to be optimized is scanning for zero bytes. (And, in fact, I havent optimized even that because its done in a C++ loop; it should really be REPZ SCASB.)
In truth, none of the code in Listing 17.5 is particularly well-optimized, and, as I noted, the program must be compiled with the large model for large cellmaps. Also, of course, the entire program is still in C++; note well that theres not a whit of assembly here.
 | Weve gotten more than a 30-times speedup simply by removing a little of the abstraction that C++ encourages, and by storing and processing the data in a manner appropriate for the typical nature of the data itself. In other words, weve done some linear, left-brained optimization (using pointers and reducing calls) and some non-linear, right-brained optimization (understanding the real problem and listening for the creative whisper of non-obvious solutions).
|
No doubt we could get another two to five times improvement with good assembly codebut thats dwarfed by a 30-times improvement, so optimization at a conceptual level must come first.
The Challenge That Ate My Life
The most recent optimization challenge I laid my community of readers was to write the fastest possible Game of Life generation engine. By engine I meant that I didnt care about time spent in input or output, only time consumed by the call to next-generation. The time spent updating the cellmap was what I wanted people to concentrate on.
Here are the rules I laid down for the challenge:
- Readers could modify any code in Listing 17.5, except the main loop, as well as change the cell map representation any way they liked. However, the code had to produce exactly the same output as Listing 17.5 under all circumstances in order to be eligible to win.
- Engine code had to be less than 400 lines long in total, excluding the video-related code shown in Listing 17.2.
- Submissions had to compile/assemble with Borland C++ (in either C++ or C mode, as desired) and/or TASM.
- All submissions had to handle cellmaps at least 200×200 in size.
- Assembly language could of course be used to speed up any part of the program. C rather than C++ was legal as well, so long as entered implementations produced the same results as Listing 17.5 and 17.2 together and were less than 400 lines long.
- All entries would be timed on the same 33 MHz 486 with a 256K external cache.
That was the challenge I put to the readers. Little did I realize the challenge it would lay on me: Entries poured in from the four corners of the globe. Some were plain, some were brilliant, some were, well, berserk. Many didnt even work. But all had to be gone through, examined for adherence to the rules, read, compiled, linked, run, and judged. I learned a lotabout a lot of things, not the least of which was the process (or maybe the wisdom) of laying down challenges to readers.
Who won? What did I learn? To find out, read on.
|