?HOW TO SCAN ABOOK

Using OCR program
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website. OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation, text-to-speech and text mining to it. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
OCR systems require calibration to read a specific font; early versions needed to be programmed with images of each character, and worked on one font at a time. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common. Some systems are capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components

• Introductory Notes
• Overview of Scanning
• How Scanning Books is Different from Other Scanning
• Tips on Scanning and Optical Character Recognition
• Tips on Editing Text
Many people ask, "How do I scan a book?". This article has been written to answer this question. The truth of the matter is that scanning a book can be extremely easy if you know what you are doing. Otherwise it will be a nightmare. Scanning a book is very different from scanning other types of documents. The tips in this article should be of great help.
This work was written to help people read using the technique called Proportional Reading. In this approach the eyes never move. You can read up to 700 words per minute and still feel like you are being read aloud to. Text can also be read out loud in real human voice at normal reading speed as it is displayed one word at a time. In order to do this type of reading text must first be in electronic form. The author spent three years developing an understanding of how to scan books easily so any student could easily scan course material or other reading material into e-text for Proportional Reading. The material presented here is essentially chapters 7 and 8 of the Instruction Manual for Proportional Reading.
Scanning really involves three parts:1) Making a picture of a page (scanning), 2) Using an Optical Character Recognition program to convert the picture into typed text and 3) Cleaning up the text after this process. In actual practice, scanning and OCR decisions are made before scanning starts.
Overview of Scanning
A scanner is used to transform a book or article into computerized text, if it is not already on disk or CD ROM. Scanning text can be done in four ways:
from the actual book placed on the scanner bed and scanned one or two pages at a time.
2) from separated pages of the book placed on the scanner one or two pages at a time
3) from actual book pages bulk-loaded into the automatic document feeder of the scanner, or
4) from copies of book pages, which are then either scanned individually or bulk loaded into the automatic document feeder.
Scanning can be done almost effortlessly if you choose the right approach. This article will help you understand what this approach should be.
Scanning involves a little bit of learning, but once a book is turned into ascii text, it can be read by everybody in a school system without any repeating of these steps. It can be mailed as a diskette or sent by modem, etc.
First, a few words about copyrights. Be sure to get copyright permission first before any wide dispersal. Proportional Reading was designed to help people read who would otherwise not be able to benefit from printed text. Publishers almost universally are very helpful in allowing special treatment of their works for the learning disabled and physically disabled.
Furthermore, Proportional Reading is designed for average readers to use on their own reading material which they already have in their possession. This private, non-profit copying of books is within purchase rights, and it makes reading possible for many and increases purchase of books.
Most importantly, the basic thrust of Proportional Reading as applied to scanning books is to return to the original book for the graphics (charts, illustrations, drawings, graphs, pictures, etc.) and to see the original text layout. To this end Proportional Reading keys to the original page numbers of the original text. As a result, actual use of the basic text book increases, not decreases. This will be especially true as millions of people become able to read and start to love learning. In all these ways Proportional Reading actually helps publishers.
Finally, the formatted or Proportionalized version of text requires a special program to play. So, the formatted text by itself is of little or no use without both the playing software as well as the original book.
In this article you will learn how to add colored pictures to scanned text. However, this process adds tremendously to file size and is therefore impractical except for short articles or articles saved on CD ROM or removable cartridge. It is usually much easier to refer to the original book for pictures and other graphics
How Scanning Books is Different from Other Types of Scanning
The best way to learn how to scan a book efficiently is to start by understanding how scanning a book differs from other types of scanning. There are eight major differences. We will see that if a book will lie flat on the scanner bed, you can scan one or both pages of text at a time. Otherwise, it is easiest by far to separate the pages and scan one side of a page at a time and OCR the page, spell check the page, and add other special marks before going on to scan the next page. We will now look at each of the eight major differences in turn.
1) Page Thickness
Most scanning is designed to be done on standard letter size, 20 lb paper. This type of medium runs perfectly through the automatic document feeder. Other thicknesses of paper will not work well in the automatic document feeder. The trouble with books is that many pages are too thick and will not even load into a document feeder. Most text book pages on the other hand are too thin and will eventually double up as they enter the document feeder. Either way automatic processing will jam up. In addition, if you are doing two sided documents, your collating will be off and all your time will be wasted. In scanning two sided documents you run through the whole stack one way and then do the whole stack on the back side and then have the computer collate everything. Any jam up take will ruin collation and all the investment of time. There is no way to simply redo collation; it takes place before editing and all offending pages wold have to be cut and repasted - a nightmere.
For this reason automatic document feeders should not be used with actual book pages unless pages are copied first onto 20 lb paper with only one side of the paper used.
2) Rounded Pages
Books may be divided immediately into two types: those that will lie flat and those that won't. Sometimes you can push down on the spine of the book to make the text lie flat. If the text won't lie flat it curves into the center and can not be scanned as is. Many textbooks are designed to make copying impossible by intentionally making the text flow close to the gutter, or center.
These books can easily be scanned. However, you must first separate the pages. Be happy about this. Scanning individual pages is much less physical work than scanning a book. In scanning individual pages there is no lifting and turning and pressing down on the book. You can sit comfortably in a chair and hardly move as you scan first one side of a page and then the other side of the same page and then the next page. Separate the book chapters into different manilla folders.
A separated book has real value after scanning. It is often much easier to read a book this way than trying to keep the pages open. Also, bookbags become much lighter when only the relevant chapters are carried around. The trick is to keep the different chapters in different folders.
3) See Through
If you want to avoid errors on italics and bold letters you have to use the highest form of resolution when scanning. This setting also gives you the best black and white picture quality if you are scanning pictures in the text as well. The trouble with this setting is that when you scan the average textbook page of thin shiny paper, the scanner will see right through the page and pick up details on the back side of the page. There is a simple way to avoid this problem. This is to put a black sheet behind the page you are copying. The see through problem will disappear immediately. Unfortunately, the belt on automatic document feeders is white, not black. Therefore, even if you could get the pages not to jam up, they will still "bleed" through.
For this reason it is best to tape a black piece of paper on the underside of the cover of the scanner and scan the pages one page at a time, or scan from an open book where the pages are automatically backed up. Alternatively you can make one-sided copies of the text pages and run these copies through the document feeder. However, this costs a lot of money and requires a good quality copier. Regardless of how good the copier is, you will loose quality when you make copies and this will cause errors in scanning. When all is said and done it is usually best to scan one page at a time, or from an open book that will lie flat.
4) Text Boxes and Captions
Many books are straight text and these are easy to scan. However, most textbooks have text boxes on colored backgrounds inserted in the middle of the text. In addition, graphics of many types with their captions are inserted in the pages. When text is scanned it ends up in a linear flow. Text boxes and captions can be very disruptive to reading if they are not moved to the end of the subsection to which they refer. When text boxes and captions are moved this way they are a joy to read in a linear flow with the main text.
The best way to do this is to specially mark the text boxes and captions right after the page is scanned and OCR'd. Here again it is usually best to scan one separated page at a time, or from an open book that will lie flat.
5) Pictures and Graphics
When you OCR text the OCRing is done in black and white. Although pictures can be automatically scanned they are not scanned in color and are therefore of little use in today's world of color. Secondly, when pictures are scanned through the OCR program, if they have not been carefully defined as pictures, the text on the pictures is removed and added to the main body of text during the actual OCR stage. This creates a very confusing piece of text.
The simple solution to this is to select just the sections of text and captions and text boxes and in the order you want, ignoring the pictures. The way to do this is to insert one page at a time and manually zone each page. This process is much faster than deselecting all the zones you do not want and then reordering the zones you have left from an automatically zoned page.
To readd a picture in color, you first save the text in ascii format and open it up in your word processor. Then you scan the colored picture using the scanner alone (not the OCR program) and then copy and paste in the desired picture into the word processor document at the desired point. Choose "screen" resolution so the picture file will not be too big.
6) Spell Checking
The best way to make sure the text is free from errors is to scan on the highest quality mode and to scan directly from the text page. The third thing to do is to use the spell checking feature on each page of text right after the text has been scanned and ocr'd. The reason for doing this now is that you can see a picture of the original scan along with the misspelled word and immediately see whether the suspicious word is ok or how to fix the error.
7) Page Numbers and Headers
Book pages often have headers and footers on pages. These need to be removed. The best way to do this is to not select them to be OCR'd in the first place. When you get the text OCR'd add the page number at the top of the page. This is very easy to do as the cursor automatically goes to the top of the page as soon as OCR is done.
Cool Titles, Sub-Titles and Key Words
If you mark titles, sub-titles and key words, it is very easy to move to any place in the e-text document. Furthermore, you can automatically create a five level outline with key words added in the appropriate sections. No retyping or handwriting is requirred. Such outlines are tremendous study aids and are essentially a free by product of scanning. Here again it is best to scan one page at a time, or from an open book that will lie flat.
Tips on Scanning and OCR'ing Text
Scanning an Open Book
When scanning an open book, you do not want to sit down and stand up repeatedly. This is very hard on the body. It is much easier to scan first two open pages, turn the page, then scan the next two open pages etc. After you are done just scanning, go back with the book and zone and OCR and check each two pages at a time. Alternatively, you can zone all the pages then OCR the lot, or you can tell the program to automatically zone and OCR the lot.
Another good trick is to place an open book on the scanner with a weight on top of it and scan two pages at a time. This way you don't have to personally press down on the book binding all the time the scanner is working. Use a gallon of water in a plastic jug for a weight. Build up an area next to the scanner to the same height as the lid, using telephone books or other books. Now you can just drag the water on and off the scanner lid (from the top of the pile). No lifting of the weight is required.
Cutting Out Pages
The way to cut out the pages of a book is to leave the two covers and binding in place. Set the book on a piece of scrap wood on the corner of a table with the bottom cover hanging vertically off the scrap wood and edge of the table. This way there is no chance of cutting the table or cutting off the back cover of the book. Lay a straight edge in from the binding about 1/4" on the first internal page and cut along this guide with a sharp knife, making several passes. You should be able to free up about 50 pages before you need to remove these pages and reset the straight edge. Cutting out the pages this way leaves a smooth surface for re-gluing pages with any wood glue.
A book can be cut apart this way in about two minutes. If you don't want to reglue the pages, reset them in the cover (still completely intact) and add a rubber band. Frequently it is much easier to read loose pages than bound pages.
Re-gluing pages is very simple. Just add some wood glue to the binding and to the binding edge of the pages and stick the pages in the binding. Let set overnight. The new binding will work just as well as before.
Notes: Some pages are printed right to the center "gutter". This makes manually scanning one or two pages at a time impossible. It is also impossible to copy such pages. These pages have to be cut out to be scanned. Secondly, tiny paperback pages are too small to fit in most document feeders. These pages should be scanned manually, two pages at a time with deferred OCR, or copied first and then inserted into the automatic document feeder.
However, cutting and then re-gluing is not workable for library books.
Making Copies of Pages
Making copies of pages and then scanning these copies has some drawbacks, but can be done quickly and effectively if you use the highest quality scanning approach. Making copies looses much clarity, which leads to increased errors; it requires an excellent copier; costs money for a copier machine, paper and tonier; and requires costly wear and tear upkeep on the copier. It also requires a document feeder and purchasing and transporting lots of paper. If you don't separate pages before copying, the book must be able to lay flat on the scanning window and text must not curl in towards the gutter. Copied pages can easily get out of order and must be checked before scanning to make sure that they are in order and that extra blank pages have not gotten inserted by mistake. Often pages just out of the copier must be reordered. Using a copier, the average 250 page book would cost at least $6.00 for copying, before scanning even begins. You can copy onto either 8 1/2" x 11" paper or 8 1/2" x 14" paper.
However, you can quickly process any book this way, especially if you copy two pages at a time. You can easily copy 300 pages an hour, two pages at a time. These pages can be inserted into the document feeder as they come off the copier. Scanning can occur simultaneously. Putting copies of pages in a document feeder is a great solution for scanning borrowed books.
The Best Plan
So, what is the solution? The best approach by far is whenever possible to scan an open book that will lie flat, scanning one or both pages at a time. The next best approach is to cut the pages away from the binding whenever possible, scan them, and then reglue them to the binding. The book will work perfectly. The third best approach is to make single sided copies of either one or two pages at a time and run the copies through the automatic document feeder.
Note: Some small paperbacks are sometimes printed on very poor quality paper with too much ink. As a result, letters are badly formed and scanning even at the best quality level will not be successful. In this situation, the best approach is to get a library edition of the book to scan. Don't just waste your time.
Page Orientation and Differentiation
If you are scanning a regular book or a paper back two pages at a time, you will have the book turned sideways with the lower left corner of the left page in the upper right corner of the scanner. If you are copying large pages one at a time or using large paper, you will have the book upside down, but with the tops of the pages towards the top of the machine. Make sure you tell the scanner program which way the text is facing: vertical (portrait) or sideways (landscape).
If you are copying two pages at a time, it is important to make sure the scanner differentiates between the left and right page. Sometimes this can be a problem if the margins and gutters between pages gets reduced too far. Otherwise, text from the two pages will merge. It is also important to cut out all the heavy black areas around the margins and in the gutter. Otherwise, these areas will be read as characters.
One solution for this problem is to manually zone the image before scanning the next page.
If you want to do automatic zoning, there is an easy way around these problems. Mark either side of the copy window half way up its length. Always center the book gutter on this center line each time you set the book down on the scanner bed. Then manually zone the scanner for two zones (one for each page), cutting out the areas of black. Be sure to zone the earlier page first (otherwise, the second page will always come before the first). Now save the zone template and call it up for this book. Pages will be automatically separated in scanning and black areas will be ignored.
Alternatively, you can set the scanner to automatically zone both pages with no zones. Then after the scanning is finished and before the text recognition function starts, manually rezone each page. At the same time you can cut out graphics and headers. You can also make the page number of each page the first and top item on that page by selecting it first, even if the page number is on the bottom of the page. The best approach is not to zone the page number and to type it in later at the top of the page, or ignore it completely and delete it later.
Note: When you scan original individual pages (cut out from the book binding) one at a time, either manually or in a document feeder, there is no gutter problem, nor problem with black areas.
If you are scanning one page at a time you may want to zone, OCR and edit each page right after it is scanned. This is fine. However, if you are doing two pages at a time, or if you want to make maximum use of your scanner, and/or if you wish to have the OCR done automatically while you do something else, you should scan all the pages first into separate files which can be finished later.
Later you, or somebody else on another machine, zones the pages manually or has them automatically zoned when OCR is done. Then the pages are OCR'd and then edited. It's usually best to scan all the pages first.
Lighten-Darken Control on Scanner (Brightness)
If you choose the fastest scanning speed, you will have to set the brightness level yourself. On the other hand, if you choose the quality scanning speeds, the scanner will automatically choose the brightness level for you.
If you are setting the brightness level yourself, be sure to scan and check just one page of text to begin with. It is important to check the scanning as it occurs. It is very important that the letters not have broken or missing parts. Cancel the scanning and move the brightness control towards darken if this is the case. Then rescan the page for a second check.
To do this, make sure the boxes for multiple pages and deferred recognition are not checked. The box for automatically saving a document should also be unchecked.
It is also very important that the letters do not run together. If this is happening, lighten the brightness control. What you are looking for is the point right between these two problems. Too much correction for one problem causes the other problem. Actually, the OCR program does not mind if the letters are very close, but it minds terribly if the letters are not completely formed or parts of letters are broken.
Don't have letters any thicker than necessary. If you do, open sections in letters like "a" and "e" will get blocked out. These letters will subsequently be misread by the character recognition program.
Start off by scanning just a single copy of text (one or two pages on the copy). Look at the little view window as the scan is progressing. Cancel the scan and reset the brightness control and re-scan as often as necessary, until you think you have scanned a single page of text correctly.
Then, when the scanning ends, look at the actual document. Doing this will uncover many setting errors that would otherwise go unnoticed. If you see on your scanned document a number of letters which are only part of the full letters they are supposed to be ("c" instead of "d" for example, "lll" instead of "M"), then you need to darken the brightness control.
Making this kind of check is the best way to save a lot of wasted time. Now is the point to take some extra time. Darken or lighten the brightness control and repeat the process until you have a clean document of text. Now start to scan. When you have this control adjusted correctly, there will be a minimum of spelling errors. All your downstream efforts at Proportionalizing and reading text will be frustrated if you have a lot of unnecessary spelling errors which you will have to correct or accept.
Remember: The easiest way around this whole chore is to use the slowest speeds (best quality) of the scanner. In these modes, brightness level is automatically adjusted. Note: the scanner will be operating as a greyscale scanner.
Don't Retain Graphics
Set the OCR program not to retain graphics. This will save you a lot of later deleting and it will speed up OCR.
Retain Font and Paragraph Formatting
Set the OCR options to retain font and paragraph formatting. This way the OCR text will look very much like the original text and you can clearly see italicized and bolded words. This makes adding special marks to titles and sub-titles and key words very easy.
Turn On Virtual Memory
If you are scanning more than just 8-10 pages of plain text, you need to turn on virtual memory. Otherwise, you will quickly run out of ram memory and scanning will stop. Automatically scanning 100 pages can easily use up 50 megabytes of memory while text is in process of being scanned and recognized. This is only a temporary use, unless you save the working Caere document on the hard drive. After actual text has been created you manually or automatically throw out the working file. You must remember to do this or your hard drive will quickly fill up. When you are finished scanning be sure to turn virtual memory off, as it causes the Proportional Reading program and other programs to run much slower than normally.
Special Situations
Occasionally the scanner will interpret a big gap between introductory numbers and related text as two separate columns. This can also happen with dialogue where each speaker has a name set off by a space. These situations are easy to correct. Just rezone the text as one unit.
Also, sometimes a list will have several columns which get read as one unit of text. You may need to rezone the list into two or more columns in proper sequence. A quick look at how the list has been zoned will tell you if you need to make a correction. It is easy to delete the current zones on a page and redo the zones and OCR. It is also easy to delete the current page and re-scan it.
Deferred Recognition
The fastest way to scan is with multiple pages in the document feeder and the multiple page and the deferred optical character recognition options turned on. These are two boxes which you check or uncheck before you start to scan. With both boxes checked the scanner will scan one page after another and defer character recognition until you are done scanning.
To manually scan one page after another, just press Command+L after you turn each page.
You will need extra hard disk memory if you are going to use deferred recognition. You should plan on leaving at least 50 to 100 megs free, depending on how many pages of text you want to scan at a time before doing the text recognition. Forty pages of text can easily temporarily use up to 20 megs of hard disk space as a Caere file. After recognition the resulting text may only be 200k. All the bit maps with their large memory requirements will have gone away or are ready for you to delete, depending on which choice you have made.
Saving Scanned Text
Be sure to save the text as ASCI text without hard returns added at the end of each line.
Other Scanning Tips
In actual practice, you can scan about 20 pages (40 sides) at a time and then tell the scanner that you are done. The scanner then makes a file for later recognition. Then you make more files of 40 or so pages each. When you are ready you can zone each page and save the file. Then you can tell the OCR program to open up all these deferred files in order and the program will OCR each file in turn. This process can take place while you are at lunch or sleeping.
For maximum use of the scanner, transfer documents of scanned only pages to another computer where zoning and OCR and spell checking and final editing will take place. If you don't have a network, use a removable cartridge hard drive. Transfer files will be large, but once processed the same cartridge can be reused over and over. This way one scanner can scan many books each day. Individual teachers or students can finish the OCR work on their own computers.
Note: Be sure to remove all deferred files from your hard drive after they have been turned into text. You can choose to do this automatically. Each deferred file is like a group of pictures, and takes up a tremendous amount of memory on your hard drive. Left to accumulate, they will quickly eat up all your disk space.
The Proper Optical Recognition Program
It is important to use a good scanner and Omni Page Professional optical character recognition program Version 6. This program is simply the best that is available. It is the only recommended choice.
Why Choose the 4C
The Hewett Packard 4C flatbed, Color Scanner without automatic document feeder is an ideal machine for scanning books. Other scanners can be used. In fact, Hewett Packard makes a black and white scanner which also has a document feeder and sells at half the price of the 4C. Since all optical character recognition is done in black and white, why use the color scanner? The following points are offered:
1) The document feeder on the 4C takes pages as small as 5" x 7". The (greyscale) scanner has a minimum size which is much larger than the 4C. This in turn means that middle-size paperbacks can not be cut apart and fed automatically on the greyscale scanner . They must be copied first. The reason for all this is that pages feed from the side of the machine and from the side of the paper (longer direction) on the 4C and from the top of the machine and the top of the paper on the greyscale scanner. A small page which measures too narrow for top loading, often still has sufficient size for automatic loading if loaded from the side.
2) Pages are more stable when scanned in the 4C. This is because the paper moves in the greyscale scanner, while the scanner light moves in the 4C.
3) With the 4C, color pictures from original text can be scanned in and added after text is recognized and in WordPerfect. Obviously, a greyscale scanner can't add color.
4) The flatbed on the 4C is much longer than the flatbed on the greyscale scanner. This means that fairly large books can be laid down on the 4C and scanned two pages at a time. You simply can not do this on the greyscale scanner flatbed.
5) Color adds a great deal to almost all presentations. The 4C allows students to make Proportional Reading articles using their own color pictures or color pictures downloaded from many other sources besides books.
6) The 4C can be used by other departments than just reading. Therefore, it can be better justified than the greyscale scanner, as the expense can be amortized over more people and more departments.
7) The 4C document feeder holds fifty separate pages while the greyscale scanner only holds twenty. Tending the machine to restock the document feeder can be cut way down with the 4C.
Tips on Editing
After scanning a book or article it is necessary to do a little editing to maximize later reading. All of these steps are optional, but you will be very pleased if you go through these steps. All of these steps can be done very quickly.
There are two places to do editing. The first editing is done in the Caere document right after OCR has taken place. The second editing is done in the saved ascii text which has been reopened in your word processor.
Editing Right after OCR in the Caere Document
The best way to edit pages is to check the pages as Caere documents first. Always have the original text on a slant board just below the monitor. As you click on the window to bring up the next page, turn the page of the original text just below the screen. If you have separated pages this is even easier to do as the pages lay flat.
Start by adding the page number. As each page comes up you should add a page number indicator to the top of the page, like "p#" and then the actual page number. Then press return to put the page number info on its own line. If you have scanned two pages at once, mark the second page now. If you did not already cut out headers in the zoning process, cut out the headers now. All this is easy to do because the cursor automatically goes to the top of each page as it comes up.
Adding the page number to the top of the page is important to do for many reasons, one of which is that saved text in ascii format will not be saved as separate pages and it is otherwise very difficult to know where one page ends and the next page starts.
After marking the page number, scroll down the text looking for any areas of colored text. These are areas the OCR program could not read. They need to be deleted or corrected. Usually they are parts of pictures or misread letters in bold or italisized sections. Delete or correct these colored areas.
Also check any columns to make sure they have been zoned correctly. If not, click back on the zone picture and redo all the zones. To do this press Command+a and then press "return". A window will appear asking you if you really want to remove all the zones from this page. Say "yes". Now click on the zoning tool and rezone the page. Then OCR just this page by typing Command+r. While you are moving your eyes down each page, make sure each paragraph ends as it should. Sometimes blank lines need to be deleted and separated text stitched together.
If text begins with an indent, occasionally the first or last full line of text will be at the beginning of the paragraph, instead of at the end. Look for this and cut and paste any such sections back to their rightful place.
Also, this is a good time to mark titles and subtitles, boxes, captions, and key words if you wish. It is easy to do this now because bolded words show up clearly as bolded and paragraph formatting is like the original. You can use the keyboard and shift key in the regular manner or you can quickly type marking combinations using the triple letter keystrokes and 555 and 554. If you doing this in WordPerfect you can use the macro keystrokes listed just before the triple letter keystrokes. However, these WordPerfect macro keystrokes won't work in Caere documents. This is why you use the triple letter keystrokes in Caere documents.
for <:# (indicates a chapter title) Type: Option+a or aaa
for <:= (indicates a primary sub-title) Type: Option+s or sss
for <: (indicates a secondary sub-title) Type: Option+d or ddd
for <:- (indicates a tertiary sub-title) Type: Option+f or fff
for <:> (marks a selected name or word) Type: Option+g or ggg
for <:% (marks a new part of a book) Type: Option+h or hhh
for p# (marks a page number) Type: Option+z or zzz
for << (marks beginning of caption or box of text) Type: Option+Comma or 555
for << (marks end of caption or box of text) Type: Option+Period or 554
If you use the triple letters and 555 and 554 you need to run the change code program in WordPerfect which will change these keystrokes into the right code. These triple letter codes and 555 and 554 are usually used on the Caere documents where macro keystrokes won't work. They save a great deal of time. To run the change code program in WordPerfect just type: Control+Option+Command+c.
Now save the text as ascii text.
Editing Saved Text in WordPerfect or Another Word Processor
Open the saved text up in WordPerfect or another word processor and spell check the text. Place the small spelling window at the bottom of the page so you can see the text as it is found. If you reduce the size of the text, you can easily see page numbers or either the current or next page on almost every page. This enables you to follow along in the original text if necessary.
The first time a new name comes up add it to the vocabulary list and the word won't resurface as needing to be spelled. Many of the remaining spelling errors will be matters of adding hyphens between words.
Do not worry about paragraph indents. All these indents (if present) are automatically removed later during Proportionalizing.
The Last Word on a Page
The last word on the page may be broken apart from the first word on the next page. If so, it will be missing a hyphen. You should add a hyphen to such words. Alternatively, you can delete the hard return between the two word parts, thereby knitting the two parts together. Doing this is often a lot more work as the page number often falls between.
Page Numbers on the Bottom of the Page
Make sure that page numbers are on the top of the page.
Marks for Text Boxes and Captions for Graphics
All of these should be marked with << before and >> afterwards.
Footnotes should either be cut out completely or placed next to their reference number in the text. You also need to type a period after any footnote number in the actual text. This way sentences will end properly with a final period. This problem arises because footnote numbers are added right next to the end of sentences without a space break. Hence they are read as part of the preceding word. Adding a final period after the number allows the end of the sentence to be recognized as such by the PR program.
Next, select and cut footnotes. Either discard them or paste them next to their reference number in the text, separated by a space or treat them like captions.
Margin Notes
Margin notes should be removed or treated as captions. The easiest thing to do is to cut them out when you block text.
Math equations need to have the spaces removed between characters. Otherwise, each number in the equation will appear on a separate line when they are presented in Proportional Reading.
Furthermore, scanning usually does a terrible job on sub and super scripts as well as fancy math graphics. If you do not want to rework the math, it may be easier to just treat math sections like a graph and have the student refer to the appropriate page in the book. Type in the words "SeePage".
The third and best approach for math equations is to cut them from the text and re-scan them as a line drawing graphic which you copy and paste into the word processor text at the right point.
Adding Interactive Pauses
If you want to add pauses to the text to make interactive questions and answers out of the text as it is read, now is a good time to do this. All you do is to type a ~ in the sentence where you want a pause to occur. When the text is Proportionalized, these marks are automatically turned into hidden signals which the reading programs recognize if you so choose. Otherwise, they will not play out.
Reversed Titles
Reversed titles, where the letters are white and the background black, will not scan. You must retype these titles if any.
Saving Prepared Text
It is a very good idea to save text that is all prepared for Proportionalizing. This is text that can be read as a regular word processing file. Furthermore, saving text at this point takes up a lot less memory. It actually takes six times as much storage to save the same amount of text once it has been Proportionalized.
If you are working with a lot of books which you are not going to use that often, you may want to save them as text files. Then you can Proportionalize a whole book overnight as necessary. This means you can save the average book on just one diskette (1.4 megs.).
Alternatively, about seventy pages of Proportionalized text can be saved on each diskette (1.4 megs.)
The best approach for a school is to keep all the books in current use on a file server in Proportional format on locked files. Each student downloads Proportionalized text as needed from the central memory onto his own, or lab computer and plays it as he or she wishes, marking the text as desired and saving selections onto personal files. This way text can also be sent via modem over the phone lines to students at home. This process can operate automatically without involving school personnel.
Text Section with Too Many Hard Returns and Tabs
Occasionally, the ocr program will create a short section of text which is all chopped up. It will have extra tabs and hard returns in it. It almost always occurs on indented text. This problem is very easy to fix. All you need to do is to select the section of text and then go up to the Search menu and activate Find/Change. Pull down the Direction sub menu to "Within Selection" then insert "hard return" in the find line and click on Change All. Next insert "tab" on the find line and again click on Change All. Your section of text will be all fixed
LOL I need and hour to read this .>>> but Thanks for the Info. anyway

LOL I need and hour to read this .>>> but Thanks for the Info. anyway Laughing
