Originally Posted By: Dignan
Drive automatically does OCR, so when I search drive for files it can search within the content for my results.

I was unaware of that capability. But as expected for me, anything Google is out of the question. As is any cloud only solution, as part of the setup will be utilizing Spotlight in OS X heavily.

Originally Posted By: Dignan
The fastest scanners are probably the Fujitsu Scansnap.
Originally Posted By: Fujitsu
The Fujitsu ScanSnap document scanner does not use a TWAIN or ISIS driver.

This automatically excludes a scanner from my list. I am in the return period for mine if it doesn't work out. However. TWAIN or ISIS is an absolute must. I'd rather deal with finding different software during a potential OS upgrade, rather then having to deal with a new scanner due to the old one no longer being supported. On OS X these days, drivers are far less likely to break in an OS upgrade compares to apps. As long as a vendor has drivers that support OS X 10.7 or greater, they are bound to work for a long time. (Due to 10.7 forcing 64 bit drivers).



Originally Posted By: tanstaafl.
Looking at what you have said above, I have two suggestions, one specific that you can't use because it is too late, and one general that is probably not how you want to do things.

I appreciate the feedback, and your experience with 176,512 documents helps smile Thankfully I have much less then that, and once the initial scanning is done, speed won't be a big concern. Going to work to make sure I keep up with scanning as new paperwork comes in. The scanner you ended up with is a bit outside my price range, and size I was willing to dedicate for this device. Impressive speed though.

Originally Posted By: tanstaafl.
Instead, leave the scanned pages in their original PDF format, and the time you would otherwise have used fighting with the OCR you can spend organizing the PDF files into directories, subdirectories, and mnemonic filenames. Something like Automotive --> Motorcycle --> 2011 --> Repairs --> 2011-05-24 Jason's Superbike Shop.

As you somewhat predicted, I'm very much against this approach. I lived with this method for a time for my music, and started to with photos. Moving to a structure where I can sort, search, and change how I see my collection has weened me off being a file janitor. It's proven to be more useful for me in my workflows with photos, music, digital documents, and 16 years of e-mail archives. I want the same approach for my scanned documents. For scanned documents with OCR, I can set up smartfolders that would contain all my car documents simply by searching for the right keywords. Those same documents could be in a second smart folder representing the past year, without copying files all over a folder structure.

*edit* I should add my goal with OCR isn't 100% reproduction. Just good enough to allow searching for documents, and within them. The image saved version would be used in any need for reprinting, or the occasional display on a tablet (likely rare).



Originally Posted By: K447
The Fujitsu bundled OCR software (ABBYY FineReader) does what you require, retaining the original scanned image with the OCR'ed text hidden 'behind' the image. If I mouse select a word or paragraph the highlighting of the OCR text appears right on/under the original image 'text'. Sometimes the alignment of the image text and the OCR text is not perfect but overall it is usually quite good.

This sounds exactly like what I need. Doug also spoke positively about ABBYY software on Windows. However, one thing stick out to me as a negative, from a review of the Mac version:
Originally Posted By: App Store person
There is no question the OCR engine and conversion utilitiy are top notch. But this product is purposefully hobbled by the utter lack of AppleScript or Automator hooks to be useful in automated workflow where the addition of a scanned document to a MacOSX Folder can trigger FineReader OCR Pro run a conversion. Instead, ALL document OCR scans from pre-existing sources require either a drag and drop on FIneReader OCR Pro, or you have to use “File->Open” dialogs.

The lack of AppleScript/Automator support may be an issue. I was planning on possibly building a workflow to help automate the initial scanning backlog I have.


Edited by drakino (19/02/2014 18:38)