ocr - auto-folder documents with templates
2014-07
I have about a million documents I need to move to new folders based on their content. I'm looking for a tool where I can give few samples and the program would find similar documents and move them to another folder.
I have different graphs and reports and pictures that have similar look within the group of documents but different values. It's easy to distinguish them by opening them manually, but it would take months to go through them all. Some sort of OCR maybe?
There are a lot of OCR and ICR software with different functionality. If you need only documents classification, you can try Recognition Server: http://www.abbyy.com/recognition_server/
Is there any software that comes with OSX or can be downloaded to do OCR on a PDF document?
This is a somewhat "meta" answer, but I'll post it anyway since you haven't got many other answers. See:
How to extract text with OCR from a PDF on Linux? — some of the answers work on OSX as well
How to do OCR on a PDF document? — closed as duplicate of the above, but still has good, useful (and different) answers. (Another reason not to delete closed questions, but to keep/merge them.)
Or use Google viewer
Or see some of the questions on the right under "related questions". I went through several of them carefully, but none stood out as particularly useful: but you may find something.
you can use OmniPage it's about 500$.
also you can use this software for your job: