Originally Posted By: drakino
Anyone have any thoughts or experience with doing this on OS X?
As you know full well, I have no experience with OS-X, but I do have some experience scanning with Windows that might well be applicable to one of those, uhhh, what is it, peach, pear, rutabaga, no, wait, Apple machines.

Just before coming to Mexico, I completed a project for a psychologist that involved scanning and organizing 176,512 documents (yes I kept count!). Looking at what you have said above, I have two suggestions, one specific that you can't use because it is too late, and one general that is probably not how you want to do things.

Specific: I would have chosen a different scanner (see this post - watch the video!) but you already have your scanner now. If you have a LOT of pages to scan, you will find five or six seconds a page to be frustrating. Remember, that speed they show is up to 18 pages per minute. Depending on what DPI you set and the complexity of the document, you may see considerably less than that.

General: If you have a LOT of pages to scan, I'd give up on the idea of OCR and text search. You can use the OCR conversion built into Adobe Acrobat and get fair accuracy (99+%) but as you note, it doesn't maintain formatting that well, and also it is s l o w. Alternatively, I have had quite good results with ABBY PDF Transformer. But any OCR conversion is going to be problematic if you are thinking in terms of being able to reprint and use the original document. Even 99% conversion accuracy will leave you with dozens of OCR errors per page.

Instead, leave the scanned pages in their original PDF format, and the time you would otherwise have used fighting with the OCR you can spend organizing the PDF files into directories, subdirectories, and mnemonic filenames. Something like Automotive --> Motorcycle --> 2011 --> Repairs --> 2011-05-24 Jason's Superbike Shop.

If you do that, it will be easier than a text search to find the file you want, and you will have the additional advantage of having it in its original formatting. If you want to edit/change a page and print it, then you could either save the page as a graphic file (.png, .jpg) and edit with a graphics editor, or OCR it and edit it in MS-Word or whatever. But since it is unlikely you will ever need to do anything other than look at these pages on-screen, why go through the considerable extra work of doing OCR?

Now, if some of your PDF files run dozens or even hundreds of pages, finding the exact page you want with the mnemonic filename system will prove difficult, but I suspect your individual files will prove small enough that locating the exact page/paragraph/sentence will be workable.

Anyway, that's how I'd do it.

tanstaafl.
_________________________
"There Ain't No Such Thing As A Free Lunch"