Many people know that running OCR on your PDFs will make them text searchable (if the PDF was created by a scan as opposed to 'printing to PDF,' in which case they are searchable as a matter of course). And man people know that running OCR takes a bit of time—as much as 10 secs per page, depending on the quality of the scanned pages. In fact, because of the time issue a lot of people don't run OCR on most of their documents.
I'm one of those people.
However, I do run OCR on case documents. That is, I do it for all of the documents that are produced in litigation, or which I produce. These kinds of documents are the ones you really want to be able to search. But even if you're not interested in searching your case documents you should probably still take the time to run OCR.
Why?
Well, most documents that I've seen produced to me contained a lot of mis-rotated pages. After I scan the documents in I want them to be rotated in the proper direction. If they're sideways then I want them rotated by 90°, and if they're upside down then they should be rotated 180°. I could go through and manually rotate the pages, but if the document set has a lot of pages this would be boring and I'd probably make mistakes.
If I run OCR, however, the rotation will be done for me automatically. That's because in order to do the OCR the software has to have all the text oriented properly. So the first thing it does, as part of its processing, is examine to see if the page is properly oriented. If not, it fixes the orientation automatically.
Even if the page is 'properly oriented' the OCR function will also help out in the case of pages that are still a little skewed to one side. So, OCR is not only great for make documents text searchable. It's also great for cleaning up a document set and making it look better.
Recent Comments