HOW TO SCAN A BOOK
by John F. Adams

© Proportional Reading 1996

Proportional Reading, P.O. Box 335, Beverly, Mass. 01915 phone (508) 927-9234



CONTENTS
  Introductory Notes 
  Overview of Scanning 
  How Scanning Books is Different from Other Scanning 
  Tips on Scanning and Optical Character Recognition 
  Tips on Editing Text 



INTRODUCTORY NOTES
Many people ask, "How do I scan a book?". This article has been written to 
answer this question. The truth of the matter is that scanning a book can be 
extremely easy if you know what you are doing. Otherwise it will be a nightmare. 
Scanning a book is very different from scanning other types of documents. The 
tips in this article should be of great help.
This work was written to help people read using the technique called 
Proportional Reading. In this approach the eyes never move. You can read up to 
700 words per minute and still feel like you are being read aloud to. Text can 
also be read out loud in real human voice at normal reading speed as it is 
displayed one word at a time. In order to do this type of reading text must 
first be in electronic form. The author spent three years developing an 
understanding of how to scan books easily so any student could easily scan 
course material or other reading material into e-text for Proportional Reading. 
The material presented here is essentially chapters 7 and 8 of the Instruction 
Manual for Proportional Reading.
Scanning really involves three parts:1) Making a picture of a page (scanning), 
2) Using an Optical Character Recognition program to convert the picture into 
typed text and 3) Cleaning up the text after this process. In actual practice, 
scanning and OCR decisions are made before scanning starts.



Overview of Scanning
A scanner is used to transform a book or article into computerized text, if it 
is not already on disk or CD ROM. Scanning text can be done in four ways:
1) from the actual book placed on the scanner bed and scanned one or two pages 
at a time. 
2) from separated pages of the book placed on the scanner one or two pages at a 
time
3) from actual book pages bulk-loaded into the automatic document feeder of the 
scanner, or 
4) from copies of book pages, which are then either scanned individually or bulk 
loaded into the automatic document feeder. 
Scanning can be done almost effortlessly if you choose the right approach. This 
article will help you understand what this approach should be.
Scanning involves a little bit of learning, but once a book is turned into ascii 
text, it can be read by everybody in a school system without any repeating of 
these steps. It can be mailed as a diskette or sent by modem, etc. 
First, a few words about copyrights. Be sure to get copyright permission first 
before any wide dispersal. Proportional Reading was designed to help people read 
who would otherwise not be able to benefit from printed text. Publishers almost 
universally are very helpful in allowing special treatment of their works for 
the learning disabled and physically disabled. 
Furthermore, Proportional Reading is designed for average readers to use on 
their own reading material which they already have in their possession. This 
private, non-profit copying of books is within purchase rights, and it makes 
reading possible for many and increases purchase of books.
Most importantly, the basic thrust of Proportional Reading as applied to 
scanning books is to return to the original book for the graphics (charts, 
illustrations, drawings, graphs, pictures, etc.) and to see the original text 
layout. To this end Proportional Reading keys to the original page numbers of 
the original text. As a result, actual use of the basic text book increases, not 
decreases. This will be especially true as millions of people become able to 
read and start to love learning. In all these ways Proportional Reading actually 
helps publishers.
Finally, the formatted or Proportionalized version of text requires a special 
program to play. So, the formatted text by itself is of little or no use without 
both the playing software as well as the original book.
In this article you will learn how to add colored pictures to scanned text. 
However, this process adds tremendously to file size and is therefore 
impractical except for short articles or articles saved on CD ROM or removable 
cartridge. It is usually much easier to refer to the original book for pictures 
and other graphics



How Scanning Books is Different from Other Types of Scanning
The best way to learn how to scan a book efficiently is to start by 
understanding how scanning a book differs from other types of scanning. There 
are eight major differences. We will see that if a book will lie flat on the 
scanner bed, you can scan one or both pages of text at a time. Otherwise, it is 
easiest by far to separate the pages and scan one side of a page at a time and 
OCR the page, spell check the page, and add other special marks before going on 
to scan the next page. We will now look at each of the eight major differences 
in turn.
1) Page Thickness
Most scanning is designed to be done on standard letter size, 20 lb paper. This 
type of medium runs perfectly through the automatic doucment feeder. Other 
thicknesses of paper will not work well in the automatic document feeder. The 
trouble with books is that many pages are too thick and will not even load into 
a document feeder. Most text book pages on the other hand are too thin and will 
eventually double up as they enter the document feeder. Either way automatic 
processing will jam up. In addition, if you are doing two sided documents, your 
collating will be off and all your time will be wasted. In scanning two sided 
documents you run through the whole stack one way and then do the whole stack on 
the back side and then have the computer collate everything. Any jam up take 
will ruin collation and all the investment of time. There is no way to simply 
redo collation; it takes place before editing and all offending pages wold have 
to be cut and repasted - a nightmere.
For this reason automatic document feeders should not be used with actual book 
pages unless pages are copied first onto 20 lb paper with only one side of the 
paper used.



2) Rounded Pages
Books may be divided immediately into two types: those that will lie flat and 
those that won't. Sometimes you can push down on the spine of the book to make 
the text lie flat. If the text won't lie flat it curves into the center and can 
not be scanned as is. Many textbooks are designed to make copying impossible by 
intentionally making the text flow close to the gutter, or center. 
These books can easily be scanned. However, you must first separate the pages. 
Be happy about this. Scanning individual pages is much less physical work than 
scanning a book. In scanning individual pages there is no lifting and turning 
and pressing down on the book. You can sit comfortably in a chair and hardly 
move as you scan first one side of a page and then the other side of the same 
page and then the next page. Separate the book chapters into different manilla 
folders. 
A separated book has real value after scanning. It is often much easier to read 
a book this way than trying to keep the pages open. Also, bookbags become much 
lighter when only the relevant chapters are carried around. The trick is to keep 
the different chapters in different folders.



3) See Through
If you want to avoid errors on italics and bold letters you have to use the 
highest form of resolution when scanning. This setting also gives you the best 
black and white picture quality if you are scanning pictures in the text as 
well. The trouble with this setting is that when you scan the average textbook 
page of thin shiny paper, the scanner will see right through the page and pick 
up details on the back side of the page. There is a simple way to avoid this 
problem. This is to put a black sheet behind the page you are copying. The see 
through problem will disappear immediately. Unfortunately, the belt on automatic 
document feeders is white, not black. Therefore, even if you could get the pages 
not to jam up, they will still "bleed" through.
For this reason it is best to tape a black piece of paper on the underside of 
the cover of the scanner and scan the pages one page at a time, or scan from an 
open book where the pages are automatically backed up. Alternatively you can 
make one-sided copies of the text pages and run these copies through the 
document feeder. However, this costs a lot of money and requires a good quality 
copier. Regardless of how good the copier is, you will loose quality when you 
make copies and this will cause errors in scanning. When all is said and done it 
is usually best to scan one page at a time, or from an open book that will lie 
flat.



4) Text Boxes and Captions
Many books are straight text and these are easy to scan. However, most textbooks 
have text boxes on colored backgrounds inserted in the middle of the text. In 
addition, graphics of many types with their captions are inserted in the pages. 
When text is scanned it ends up in a linear flow. Text boxes and captions can be 
very disruptive to reading if they are not moved to the end of the subsection to 
which they refer. When text boxes and captions are moved this way they are a joy 
to read in a linear flow with the main text.
The best way to do this is to specially mark the text boxes and captions right 
after the page is scanned and OCR'd. Here again it is usually best to scan one 
separated page at a time, or from an open book that will lie flat.



5) Pictures and Graphics
When you OCR text the OCRing is done in black and white. Although pictures can 
be automatically scanned they are not scanned in color and are therefore of 
little use in today's world of color. Secondly, when pictures are scanned 
through the OCR program, if they have not been carefully defined as pictures, 
the text on the pictures is removed and added to the main body of text during 
the actual OCR stage. This creates a very confusing piece of text.
The simple solution to this is to select just the sections of text and captions 
and text boxes and in the order you want, ignoring the pictures. The way to do 
this is to insert one page at a time and manually zone each page. This process 
is much faster than deselecting all the zones you do not want and then 
reordering the zones you have left from an automatically zoned page.
To readd a picture in color, you first save the text in ascii format and open it 
up in your word processor. Then you scan the colored picture using the scanner 
alone (not the OCR program) and then copy and paste in the desired picture into 
the word processor document at the desired point. Choose "screen" resolution so 
the picture file will not be too big.



6) Spell Checking
The best way to make sure the text is free from errors is to scan on the highest 
quality mode and to scan directly from the text page. The third thing to do is 
to use the spell checking feature on each page of text right after the text has 
been scanned and ocr'd. The reason for doing this now is that you can see a 
picture of the original scan along with the misspelled word and immediately see 
whether the suspicious word is ok or how to fix the error.



7) Page Numbers and Headers
Book pages often have headers and footers on pages. These need to be removed. 
The best way to do this is to not select them to be OCR'd in the first place. 
When you get the text OCR'd add the page number at the top of the page. This is 
very easy to do as the cursor automatically goes to the top of the page as soon 
as OCR is done.



8) Titles, Sub-Titles and Key Words
If you mark titles, sub-titles and key words, it is very easy to move to any 
place in the e-text document. Furthermore, you can automatically create a five 
level outline with key words added in the appropriate sections. No retyping or 
handwriting is requirred. Such outlines are tremendous study aids and are 
essentially a free by product of scanning. Here again it is best to scan one 
page at a time, or from an open book that will lie flat.



Tips on Scanning and OCR'ing Text
Scanning an Open Book
When scanning an open book, you do not want to sit down and stand up repeatedly. 
This is very hard on the body. It is much easier to scan first two open pages, 
turn the page, then scan the next two open pages etc. After you are done just 
scanning, go back with the book and zone and OCR and check each two pages at a 
time. Alternatively, you can zone all the pages then OCR the lot, or you can 
tell the program to automatically zone and OCR the lot.
Another good trick is to place an open book on the scanner with a weight on top 
of it and scan two pages at a time. This way you don't have to personally press 
down on the book binding all the time the scanner is working. Use a gallon of 
water in a plastic jug for a weight. Build up an area next to the scanner to the 
same height as the lid, using telephone books or other books. Now you can just 
drag the water on and off the scanner lid (from the top of the pile). No lifting 
of the weight is required. 
Cutting Out Pages
The way to cut out the pages of a book is to leave the two covers and binding in 
place. Set the book on a piece of scrap wood on the corner of a table with the 
bottom cover hanging vertically off the scrap wood and edge of the table. This 
way there is no chance of cutting the table or cutting off the back cover of the 
book. Lay a straight edge in from the binding about 1/4" on the first internal 
page and cut along this guide with a sharp knife, making several passes. You 
should be able to free up about 50 pages before you need to remove these pages 
and reset the straight edge. Cutting out the pages this way leaves a smooth 
surface for re-gluing pages with any wood glue. 
A book can be cut apart this way in about two minutes. If you don't want to 
reglue the pages, reset them in the cover (still completely intact) and add a 
rubber band. Frequently it is much easier to read loose pages than bound pages.
Re-gluing pages is very simple. Just add some wood glue to the binding and to 
the binding edge of the pages and stick the pages in the binding. Let set 
overnight. The new binding will work just as well as before.
Notes: Some pages are printed right to the center "gutter". This makes manually 
scanning one or two pages at a time impossible. It is also impossible to copy 
such pages. These pages have to be cut out to be scanned. Secondly, tiny 
paperback pages are too small to fit in most document feeders. These pages 
should be scanned manually, two pages at a time with deferred OCR, or copied 
first and then inserted into the automatic document feeder.
However, cutting and then re-gluing is not workable for library books.
Making Copies of Pages
Making copies of pages and then scanning these copies has some drawbacks, but 
can be done quickly and effectively if you use the highest quality scanning 
approach. Making copies looses much clarity, which leads to increased errors; it 
requires an excellent copier; costs money for a copier machine, paper and 
tonier; and requires costly wear and tear upkeep on the copier. It also requires 
a document feeder and purchasing and transporting lots of paper. If you don't 
separate pages before copying, the book must be able to lay flat on the scanning 
window and text must not curl in towards the gutter. Copied pages can easily get 
out of order and must be checked before scanning to make sure that they are in 
order and that extra blank pages have not gotten inserted by mistake. Often 
pages just out of the copier must be reordered. Using a copier, the average 250 
page book would cost at least $6.00 for copying, before scanning even begins. 
You can copy onto either 8 1/2" x 11" paper or 8 1/2" x 14" paper. 
However, you can quickly process any book this way, especially if you copy two 
pages at a time. You can easily copy 300 pages an hour, two pages at a time. 
These pages can be inserted into the document feeder as they come off the 
copier. Scanning can occur simultaneously. Putting copies of pages in a document 
feeder is a great solution for scanning borrowed books.
The Best Plan
So, what is the solution? The best approach by far is whenever possible to scan 
an open book that will lie flat, scanning one or both pages at a time. The next 
best approach is to cut the pages away from the binding whenever possible, scan 
them, and then reglue them to the binding. The book will work perfectly. The 
third best approach is to make single sided copies of either one or two pages at 
a time and run the copies through the automatic document feeder.
Note: Some small paperbacks are sometimes printed on very poor quality paper 
with too much ink. As a result, letters are badly formed and scanning even at 
the best quality level will not be successful. In this situation, the best 
approach is to get a library edition of the book to scan. Don't just waste your 
time.
Page Orientation and Differentiation
If you are scanning a regular book or a paper back two pages at a time, you will 
have the book turned sideways with the lower left corner of the left page in the 
upper right corner of the scanner. If you are copying large pages one at a time 
or using large paper, you will have the book upside down, but with the tops of 
the pages towards the top of the machine. Make sure you tell the scanner program 
which way the text is facing: vertical (portrait) or sideways (landscape).
If you are copying two pages at a time, it is important to make sure the scanner 
differentiates between the left and right page. Sometimes this can be a problem 
if the margins and gutters between pages gets reduced too far. Otherwise, text 
from the two pages will merge. It is also important to cut out all the heavy 
black areas around the margins and in the gutter. Otherwise, these areas will be 
read as characters.
One solution for this problem is to manually zone the image before scanning the 
next page.
If you want to do automatic zoning, there is an easy way around these problems. 
Mark either side of the copy window half way up its length. Always center the 
book gutter on this center line each time you set the book down on the scanner 
bed. Then manually zone the scanner for two zones (one for each page), cutting 
out the areas of black. Be sure to zone the earlier page first (otherwise, the 
second page will always come before the first). Now save the zone template and 
call it up for this book. Pages will be automatically separated in scanning and 
black areas will be ignored.
Alternatively, you can set the scanner to automatically zone both pages with no 
zones. Then after the scanning is finished and before the text recognition 
function starts, manually rezone each page. At the same time you can cut out 
graphics and headers. You can also make the page number of each page the first 
and top item on that page by selecting it first, even if the page number is on 
the bottom of the page. The best approach is not to zone the page number and to 
type it in later at the top of the page, or ignore it completely and delete it 
later.
Note: When you scan original individual pages (cut out from the book binding) 
one at a time, either manually or in a document feeder, there is no gutter 
problem, nor problem with black areas.
If you are scanning one page at a time you may want to zone, OCR and edit each 
page right after it is scanned. This is fine. However, if you are doing two 
pages at a time, or if you want to make maximum use of your scanner, and/or if 
you wish to have the OCR done automatically while you do something else, you 
should scan all the pages first into separate files which can be finished later.
Later you, or somebody else on another machine, zones the pages manually or has 
them automatically zoned when OCR is done. Then the pages are OCR'd and then 
edited. It's usually best to scan all the pages first.
Lighten-Darken Control on Scanner (Brightness)
If you choose the fastest scanning speed, you will have to set the brightness 
level yourself. On the other hand, if you choose the quality scanning speeds, 
the scanner will automatically choose the brightness level for you. 
If you are setting the brightness level yourself, be sure to scan and check just 
one page of text to begin with. It is important to check the scanning as it 
occurs. It is very important that the letters not have broken or missing parts. 
Cancel the scanning and move the brightness control towards darken if this is 
the case. Then rescan the page for a second check. 
To do this, make sure the boxes for multiple pages and deferred recognition are 
not checked. The box for automatically saving a document should also be 
unchecked.
It is also very important that the letters do not run together. If this is 
happening, lighten the brightness control. What you are looking for is the point 
right between these two problems. Too much correction for one problem causes the 
other problem. Actually, the OCR program does not mind if the letters are very 
close, but it minds terribly if the letters are not completely formed or parts 
of letters are broken.
Don't have letters any thicker than necessary. If you do, open sections in 
letters like "a" and "e" will get blocked out. These letters will subsequently 
be misread by the character recognition program.
Start off by scanning just a single copy of text (one or two pages on the copy). 
Look at the little view window as the scan is progressing. Cancel the scan and 
reset the brightness control and re-scan as often as necessary, until you think 
you have scanned a single page of text correctly.
Then, when the scanning ends, look at the actual document. Doing this will 
uncover many setting errors that would otherwise go unnoticed. If you see on 
your scanned document a number of letters which are only part of the full 
letters they are supposed to be ("c" instead of "d" for example, "lll" instead 
of "M"), then you need to darken the brightness control. 
Making this kind of check is the best way to save a lot of wasted time. Now is 
the point to take some extra time. Darken or lighten the brightness control and 
repeat the process until you have a clean document of text. Now start to scan. 
When you have this control adjusted correctly, there will be a minimum of 
spelling errors. All your downstream efforts at Proportionalizing and reading 
text will be frustrated if you have a lot of unnecessary spelling errors which 
you will have to correct or accept.
Remember: The easiest way around this whole chore is to use the slowest speeds 
(best quality) of the scanner. In these modes, brightness level is automatically 
adjusted. Note: the scanner will be operating as a greyscale scanner.
Don't Retain Graphics
Set the OCR program not to retain graphics. This will save you a lot of later 
deleting and it will speed up OCR.
Retain Font and Paragraph Formatting
Set the OCR options to retain font and paragraph formatting. This way the OCR 
text will look very much like the original text and you can clearly see 
italicized and bolded words. This makes adding special marks to titles and 
sub-titles and key words very easy.
Turn On Virtual Memory
If you are scanning more than just 8-10 pages of plain text, you need to turn on 
virtual memory. Otherwise, you will quickly run out of ram memory and scanning 
will stop. Automatically scanning 100 pages can easily use up 50 megabytes of 
memory while text is in process of being scanned and recognized. This is only a 
temporary use, unless you save the working Caere document on the hard drive. 
After actual text has been created you manually or automatically throw out the 
working file. You must remember to do this or your hard drive will quickly fill 
up. When you are finished scanning be sure to turn virtual memory off, as it 
causes the Proportional Reading program and other programs to run much slower 
than normally.
Special Situations
Occasionally the scanner will interpret a big gap between introductory numbers 
and related text as two separate columns. This can also happen with dialogue 
where each speaker has a name set off by a space. These situations are easy to 
correct. Just rezone the text as one unit.
Also, sometimes a list will have several columns which get read as one unit of 
text. You may need to rezone the list into two or more columns in proper 
sequence. A quick look at how the list has been zoned will tell you if you need 
to make a correction. It is easy to delete the current zones on a page and redo 
the zones and OCR. It is also easy to delete the current page and re-scan it.
Deferred Recognition
The fastest way to scan is with multiple pages in the document feeder and the 
multiple page and the deferred optical character recognition options turned on. 
These are two boxes which you check or uncheck before you start to scan. With 
both boxes checked the scanner will scan one page after another and defer 
character recognition until you are done scanning.
To manually scan one page after another, just press Command+L after you turn 
each page.
You will need extra hard disk memory if you are going to use deferred 
recognition. You should plan on leaving at least 50 to 100 megs free, depending 
on how many pages of text you want to scan at a time before doing the text 
recognition. Forty pages of text can easily temporarily use up to 20 megs of 
hard disk space as a Caere file. After recognition the resulting text may only 
be 200k. All the bit maps with their large memory requirements will have gone 
away or are ready for you to delete, depending on which choice you have made. 
Saving Scanned Text
Be sure to save the text as ASCI text without hard returns added at the end of 
each line.
Other Scanning Tips
In actual practice, you can scan about 20 pages (40 sides) at a time and then 
tell the scanner that you are done. The scanner then makes a file for later 
recognition. Then you make more files of 40 or so pages each. When you are ready 
you can zone each page and save the file. Then you can tell the OCR program to 
open up all these deferred files in order and the program will OCR each file in 
turn. This process can take place while you are at lunch or sleeping.
For maximum use of the scanner, transfer documents of scanned only pages to 
another computer where zoning and OCR and spell checking and final editing will 
take place. If you don't have a network, use a removable cartridge hard drive. 
Transfer files will be large, but once processed the same cartridge can be 
reused over and over. This way one scanner can scan many books each day. 
Individual teachers or students can finish the OCR work on their own computers.
Note: Be sure to remove all deferred files from your hard drive after they have 
been turned into text. You can choose to do this automatically. Each deferred 
file is like a group of pictures, and takes up a tremendous amount of memory on 
your hard drive. Left to accumulate, they will quickly eat up all your disk 
space.
The Proper Optical Recognition Program
It is important to use a good scanner and Omni Page Professional optical 
character recognition program Version 6. This program is simply the best that is 
available. It is the only recommended choice.
Why Choose the 4C
The Hewett Packard 4C flatbed, Color Scanner without automatic document feeder 
is an ideal machine for scanning books. Other scanners can be used. In fact, 
Hewett Packard makes a black and white scanner which also has a document feeder 
and sells at half the price of the 4C. Since all optical character recognition 
is done in black and white, why use the color scanner? The following points are 
offered:
1) The document feeder on the 4C takes pages as small as 5" x 7". The 
(greyscale) scanner has a minimum size which is much larger than the 4C. This in 
turn means that middle-size paperbacks can not be cut apart and fed 
automatically on the greyscale scanner . They must be copied first. The reason 
for all this is that pages feed from the side of the machine and from the side 
of the paper (longer direction) on the 4C and from the top of the machine and 
the top of the paper on the greyscale scanner. A small page which measures too 
narrow for top loading, often still has sufficient size for automatic loading if 
loaded from the side.
2) Pages are more stable when scanned in the 4C. This is because the paper moves 
in the greyscale scanner, while the scanner light moves in the 4C.
3) With the 4C, color pictures from original text can be scanned in and added 
after text is recognized and in WordPerfect. Obviously, a greyscale scanner 
can't add color.
4) The flatbed on the 4C is much longer than the flatbed on the greyscale 
scanner. This means that fairly large books can be laid down on the 4C and 
scanned two pages at a time. You simply can not do this on the greyscale scanner 
flatbed.
5) Color adds a great deal to almost all presentations. The 4C allows students 
to make Proportional Reading articles using their own color pictures or color 
pictures downloaded from many other sources besides books.
6) The 4C can be used by other departments than just reading. Therefore, it can 
be better justified than the greyscale scanner, as the expense can be amortized 
over more people and more departments.
7) The 4C document feeder holds fifty separate pages while the greyscale scanner 
only holds twenty. Tending the machine to restock the document feeder can be cut 
way down with the 4C.



Tips on Editing
After scanning a book or article it is necessary to do a little editing to 
maximize later reading. All of these steps are optional, but you will be very 
pleased if you go through these steps. All of these steps can be done very 
quickly.
There are two places to do editing. The first editing is done in the Caere 
document right after OCR has taken place. The second editing is done in the 
saved ascii text which has been reopened in your word processor.
Editing Right after OCR in the Caere Document
The best way to edit pages is to check the pages as Caere documents first. 
Always have the original text on a slant board just below the monitor. As you 
click on the window to bring up the next page, turn the page of the original 
text just below the screen. If you have separated pages this is even easier to 
do as the pages lay flat.
Start by adding the page number. As each page comes up you should add a page 
number indicator to the top of the page, like "p#" and then the actual page 
number. Then press return to put the page number info on its own line. If you 
have scanned two pages at once, mark the second page now. If you did not already 
cut out headers in the zoning process, cut out the headers now. All this is easy 
to do because the cursor automatically goes to the top of each page as it comes 
up. 
Adding the page number to the top of the page is important to do for many 
reasons, one of which is that saved text in ascii format will not be saved as 
separate pages and it is otherwise very difficult to know where one page ends 
and the next page starts. 
After marking the page number, scroll down the text looking for any areas of 
colored text. These are areas the OCR program could not read. They need to be 
deleted or corrected. Usually they are parts of pictures or misread letters in 
bold or italisized sections. Delete or correct these colored areas. 
Also check any columns to make sure they have been zoned correctly. If not, 
click back on the zone picture and redo all the zones. To do this press 
Command+a and then press "return". A window will appear asking you if you really 
want to remove all the zones from this page. Say "yes". Now click on the zoning 
tool and rezone the page. Then OCR just this page by typing Command+r. While you 
are moving your eyes down each page, make sure each paragraph ends as it should. 
Sometimes blank lines need to be deleted and separated text stitched together.
If text begins with an indent, occasionally the first or last full line of text 
will be at the beginning of the paragraph, instead of at the end. Look for this 
and cut and paste any such sections back to their rightful place.
Also, this is a good time to mark titles and subtitles, boxes, captions, and key 
words if you wish. It is easy to do this now because bolded words show up 
clearly as bolded and paragraph formatting is like the original. You can use the 
keyboard and shift key in the regular manner or you can quickly type marking 
combinations using the triple letter keystrokes and 555 and 554. If you doing 
this in WordPerfect you can use the macro keystrokes listed just before the 
triple letter keystrokes. However, these WordPerfect macro keystrokes won't work 
in Caere documents. This is why you use the triple letter keystrokes in Caere 
documents.
for <:# (indicates a chapter title) Type: Option+a or aaa
for <:= (indicates a primary sub-title) Type: Option+s or sss
for <: (indicates a secondary sub-title) Type: Option+d or ddd
for <:- (indicates a tertiary sub-title) Type: Option+f or fff
for <:> (marks a selected name or word) Type: Option+g or ggg
for <:% (marks a new part of a book) Type: Option+h or hhh
for p# (marks a page number) Type: Option+z or zzz
for << (marks beginning of caption or box of text) Type: Option+Comma or 555
for << (marks end of caption or box of text) Type: Option+Period or 554
If you use the triple letters and 555 and 554 you need to run the change code 
program in WordPerfect which will change these keystrokes into the right code. 
These triple letter codes and 555 and 554 are usually used on the Caere 
documents where macro keystrokes won't work. They save a great deal of time. To 
run the change code program in WordPerfect just type: Control+Option+Command+c.
Now save the text as ascii text.
Editing Saved Text in WordPerfect or Another Word Processor
Open the saved text up in WordPerfect or another word processor and spell check 
the text. Place the small spelling window at the bottom of the page so you can 
see the text as it is found. If you reduce the size of the text, you can easily 
see page numbers or either the current or next page on almost every page. This 
enables you to follow along in the original text if necessary.
The first time a new name comes up add it to the vocabulary list and the word 
won't resurface as needing to be spelled. Many of the remaining spelling errors 
will be matters of adding hyphens between words.
Do not worry about paragraph indents. All these indents (if present) are 
automatically removed later during Proportionalizing.
The Last Word on a Page
The last word on the page may be broken apart from the first word on the next 
page. If so, it will be missing a hyphen. You should add a hyphen to such words. 
Alternatively, you can delete the hard return between the two word parts, 
thereby knitting the two parts together. Doing this is often a lot more work as 
the page number often falls between.
Page Numbers on the Bottom of the Page
Make sure that page numbers are on the top of the page.
Marks for Text Boxes and Captions for Graphics
All of these should be marked with << before and >> afterwards.
Footnotes
Footnotes should either be cut out completely or placed next to their reference 
number in the text. You also need to type a period after any footnote number in 
the actual text. This way sentences will end properly with a final period. This 
problem arises because footnote numbers are added right next to the end of 
sentences without a space break. Hence they are read as part of the preceding 
word. Adding a final period after the number allows the end of the sentence to 
be recognized as such by the PR program.
Next, select and cut footnotes. Either discard them or paste them next to their 
reference number in the text, separated by a space or treat them like captions. 
Margin Notes
Margin notes should be removed or treated as captions. The easiest thing to do 
is to cut them out when you block text.
Math
Math equations need to have the spaces removed between characters. Otherwise, 
each number in the equation will appear on a separate line when they are 
presented in Proportional Reading.
Furthermore, scanning usually does a terrible job on sub and super scripts as 
well as fancy math graphics. If you do not want to rework the math, it may be 
easier to just treat math sections like a graph and have the student refer to 
the appropriate page in the book. Type in the words "SeePage". 
The third and best approach for math equations is to cut them from the text and 
re-scan them as a line drawing graphic which you copy and paste into the word 
processor text at the right point.
Adding Interactive Pauses
If you want to add pauses to the text to make interactive questions and answers 
out of the text as it is read, now is a good time to do this. All you do is to 
type a ~ in the sentence where you want a pause to occur. When the text is 
Proportionalized, these marks are automatically turned into hidden signals which 
the reading programs recognize if you so choose. Otherwise, they will not play 
out.
Reversed Titles
Reversed titles, where the letters are white and the background black, will not 
scan. You must retype these titles if any.
Saving Prepared Text
It is a very good idea to save text that is all prepared for Proportionalizing. 
This is text that can be read as a regular word processing file. Furthermore, 
saving text at this point takes up a lot less memory. It actually takes six 
times as much storage to save the same amount of text once it has been 
Proportionalized. 
If you are working with a lot of books which you are not going to use that 
often, you may want to save them as text files. Then you can Proportionalize a 
whole book overnight as necessary. This means you can save the average book on 
just one diskette (1.4 megs.).
Alternatively, about seventy pages of Proportionalized text can be saved on each 
diskette (1.4 megs.)
The best approach for a school is to keep all the books in current use on a file 
server in Proportional format on locked files. Each student downloads 
Proportionalized text as needed from the central memory onto his own, or lab 
computer and plays it as he or she wishes, marking the text as desired and 
saving selections onto personal files. This way text can also be sent via modem 
over the phone lines to students at home. This process can operate automatically 
without involving school personnel.
Text Section with Too Many Hard Returns and Tabs
Occasionally, the ocr program will create a short section of text which is all 
chopped up. It will have extra tabs and hard returns in it. It almost always 
occurs on indented text. This problem is very easy to fix. All you need to do is 
to select the section of text and then go up to the Search menu and activate 
Find/Change. Pull down the Direction sub menu to "Within Selection" then insert 
"hard return" in the find line and click on Change All. Next insert "tab" on the 
find line and again click on Change All. Your section of text will be all fixed 
up.
Note: Be sure to choose "within selection" or you will cut out all the hard 
returns and/or tabs in the piece.