OCR Tutorial for Acrobat 4 and 5
The process for doing Optical Character Recognition using Acrobat 4 or 5 is similar to that outlined in the previous post on Acrobat 6. You open and convert an image file, and run the "Paper Capture" tool on it.
The most significant difference in the earlier versions is that, until Acrobat 6, you could only OCR 50 pages at a time. That limitation really hinders the usefulness of the earlier versions, and although you can work around it, it's a pain. Nevertheless, here are the steps. They are virtually identical for Mac and PC. I'll use the TIFF file example again.
In Acrobat, go to the File menu and choose Import. . .
In the dialog box, select your TIFF file and click OK. That will turn your TIFF image into a PDF image. You still need Acrobat to "read" it and convert the pictures of letters into actual text letters.
Go to Tools > Paper Capture > Capture Pages. This will give you a couple of choices. The PDF Output Style you want is "Original Image with Hidden Text." Click OK and you can select which pages to OCR (all, current, or a page range). Click OK again, and the OCR engine fires up. Go make tea. (Although, since you're doing a maximum of 50 pages, this won't take all that long.)
Don't forget to do FIle > Save. You now have a word-searchable document.
Here is a quick war story/workaround for the old 50 page limit. With the advent of Acrobat 6, I can't think of any reason why an attorney that has the need to OCR even a handful of scanned documents shouldn't upgrade. I actually used this method, but I wasn't billing anybody by the hour. . . and there weren't any alternatives ready to hand. I also did it on a Mac, so I managed to use AppleScript to automate some of it. (Which is the full extent of my programming ability.) I have no idea how you would do it on a PC. If you have the ability to automate this on a PC, get a quick programming moonlighting job and use the money to upgrade to Acrobat 6 or one of the other commercial programs...
If you have a big TIFF file you need to convert, get one of the excellent shareware image-handling programs that will allow you select 50 pages/images at a time, and split the big file into smaller ones. Create as many 50-page files as it takes. Open, convert, and OCR all of those files (that's where AppleScript comes in handy). Name them something like File 1, FIle 2 etc. or you're going to get mightily confused. Run the Batch Process that creates thumbnails for all of the files. Now, open File 1 in Acrobat, open the thumbnail pane and pull it all the way across the screen so you only see thumbnails. Navigate to last page. Use the various "add pages" or "append pages" commands to stick the next 50 pages into your PDF. Rinse and repeat as necessary. Save, and voila! You've got a great big OCR'd PDF.
Now that I look at this kludge, it makes me want to weep . . . (however, one does what one can with the available tools). There is really no longer any good reason to go to those lengths because Acrobat 6 and other PDF creation and OCR tools are widely available.
Hope this helps those who are still using the older versions.
-- Dave
could you please post the applescript you used for this?
I could not figure out how to automate the ocr recognition process...
thanks,
Eric
Posted by: Eric O | January 13, 2005 at 12:50 AM