Workshop Mission Services Portfolio Content Contact
  bookmark me  
Home > Content

(1) (2)

The Process

Converting printed documents into computer editable documents is fairly straightforward. Upon starting TypeReader, I select my scanner – Mustek 1200 LP (approx. cost $100). I click on the ‘Auto Straighten’ and ‘Auto Orientation’ buttons. I used default settings for the rest.

I lay the book on my flatbed scanner. I then choose ‘From Scanner’ under the ‘Get Page’ button and then select ‘Auto Start.’ My scanning software starts. There I finalize my scan settings. Based on what I’ve read and some initial testing (I started at 200dpi) I chose 300 dpi and selected black & white. As long as you have good quality documents (i.e. clean readable text) these settings will be sufficient.

Converted text w/errors highlighted

Property window reveals possible errors

 

I then scan the document and it is immediately sent to ‘TypeReader.’ There the image is broken up into sections and the software converts those sections into computer editable text.  You then have the option of exporting this new document in a number of formats including .html, .rtf, .doc, and others. I tried using the .html method, but I did not like the result. It created lots of tables. I ended up saving my files as .rtf. I then opened my files in Word and did my spell check. I then saved the files as .html and imported them into our Dreamweaver template. I scanned the images separately (72 dpi/color) and imported them over after cropping them in Photoshop.

99 percent?
Is OCR 99 % accurate? The answer is - pretty close. One of the pluses with TypeReader 6.0 is the 'Properties' window that pops up revealing the number of 'suspect' and 'illegible' characters on the page. I did have to spend a little time correcting mistakes. Most come from converting certain kinds of type. Italicized fonts, quotation marks and some period/commas appear to be the usual suspects that I had to fix. But I was quite impressed with the speed and accuracy of the process.

Serious OCR
If you are seriously considering using OCR to convert many documents get yourself a fast flatbed scanner with an auto document feeder. Of course you can not run a book through an auto feeder and they will add several hundred dollars to the price of your scanner.

Final Thoughts
OCR is a great technology that has come a long way in the last few years. It has sure saved me a ton of work. It could do the same for you.

(1) (2)

 

1.818.509.2649