Free OCR for Arabic text

22
2013-10
  • pnuts

    A friend has requested I convert an Arabic text .pdf into Word. Google Docs does not seem an option but new OCR looked promising because Arabic is featured in the 'Recognition language' dropdown. I have failed to get this to work beyond "Error! Text can not be recognized." even with only a few sample pages (111KB).

    I'd much appreciate any advice about what I am doing wrong at that site (or even how to access any help available there!) or pointing to other (free!) options that work with Arabic text (preferably that do not require registration and or large downloads). Anyone willing to help please?

    Note this .pdf does not have a text layer.

  • Answers
  • pnuts

    Since the question was not a request for recommendation of the best program (presumably that would be off topic here) but about either getting new OCR to work the way it seemed it was supposed to, or for any other free converter that works for Arabic text I think it is fair to say that OCR Convert is an answer. This is online, free and requires no registration.

    It did not manage all 67 pages at one time (after about 15 minutes the program reported an error) but it has converted 10 pages at one time. The quality/accuracy is suspect (based on translation with Google) but I am happy to consider that a separate issue.


  • Related Question

    osx - Simple, free OCR software (for OS X)?
  • dbr

    Something I've had an occasional need for, but I've never found an application I liked - OCR

    Basically I want to take a photo/scan of a document, and convert it to a text document of some kind (Ideally an option for plain-text, perhaps a .doc or .pages)

    Requirements:

    • Must have a native (Cocoa) GUI, not under X11
    • Free

    Optional pluses:

    • Doesn't require installation, just drag-app-to-Applications-folder (a lot of the OCR utilities I found required libraries to be installed and such)
    • Support images in scanned documents
    • (Apple-)scriptable
    • Open source

  • Related Answers
  • Ludwig Weinzierl

    When I researched this topic a while ago there wasn't any free software for any platform that produced reasonable quality output.

    The Optical character recognition article at Wikipedia lists the following free OCR applications:

    I only tried gocr from these, it has no gui and the qualitiy of its output is very low.

    I's suggest to go with a commercial product. Either ABBY Finereader or OmniPage, both of which have OS X versions. They are often bundled with scanners and you can buy them pretty cheap if you don't need the latest version.

  • Jeremy French

    Tesseract, is free, has an OSX port and a how to

  • Slink84

    Never heard of any good free OCR for Mac :] There is GOCR, but it is rather crappy. From the low cost apps I would recommend VelOCRaptor. You can try it out for free.

  • Ronald Pottol

    how about evernote? send them the image, they ocr it for you.

  • Haddock

    Sorry, but some command line action is involved in this solution...

    If all you need to do is convert a PDF containing scanned pages into text, the following method has given me good results - where GUI based tools such as VelOCRaptor have failed (I'm talking about a 134 page PDF doc with scanned pages).

    All programs in the tool chain are free or come with OSX.

    • With Preview, save the PDF as TIFF, 150dpi. Make sure you have enough disk space to play with.
    • Run the TIFF through Tesseract (install using MacPorts / Fink)
    • Now you have a raw text file, which can be spell checked using any good editor (TextWrangler, TextEdit, etc.)

    Good luck!

  • dbr

    I have just found something called 'PDF OCR X Community Edition. It's pretty basic - it just gets plain text out without formatting. However, it works quite well. I used it for scanning German, even though officiall it only works with English, and it was ok.

  • Darren Meyer

    I haven't been able to find anything for free, but PDFScanner is cheap at $15, is a native OS-X app (Snow Leopard and Lion only, AFAICT), and both scans to an OCR'd PDF and lets you open and OCR an existing PDF. It's the only not-horribly-expensive thing I've found.

  • Troggy

    http://www.thefreecountry.com/utilities/ocr.shtml

    Sounds like if you have microsoft office, there is a tool that can convert images into files. Other than that, i am not seeing anything that is free for os x.

    http://discussions.apple.com/thread.jspa?messageID=9807438 Very recent apple discussion in the support forums

    you did mention you only want a native cocoa app, if you could consider some of the linux builds, you might have some luck as there are a few options there.

  • Larry Gritz

    I know it's not free, but I've had reasonably good experience using ReadIris, which you can find for around $60. Basically it will take scans, jpegs or several other image formats, or even PDFs containing scanned data, do OCR on them, and write as PDF ("searchable" -- i.e., text and/or the image itself).