conversion - What are the possible tools to convert from pdf to epub?

08
2014-07
  • pixelastic

    I know similar questions have been asked before, but before marking it as duplicate, let me explain.

    I just bought a .pdf ebook online and would like to have it as a .epub instead. I usually use calibre (v1.0.0) to this sort of tasks with great success. This time, a large proportions of lines just seems to get messed up during the conversion.

    Jérôme disait aimer le rouge. Sa marotte
    FRQVLVWDLW VXUWRXW ¡ O#HQOHYHU 'ªJUDIHU OD
    dentelle était un geste qu'il effectuait avec la
    

    Even if you don't speak french, you'll notice that the middle line is garbage. And it's not only a useless line, it does replace actual content.

    The calibre ebook viewer displays the initial pdf with the garbage, while it displays just fine with my default pdf viewer. I tried converting to mobi, txt, mkd, to no avail.

    I tried pdftotext, and the online tool http://www.zamzar.com/ and got the same output.

    I then converted the pdf to .pbm files and tried running gocr and ocrad on it. The OCR results were quite interesting, but not good enough to be used as-is.

    Jérôme _sȧit aimer le rouge. Sa marotte
    consistait surTout à l'enlever. Dégrafer la
    dentelle était un geste qu_l effectuait avec la
    

    Would you have any idea of other tools that could help in the process or options to fine-tune calibre or OCR programs ?

    Note: I'm running ubuntu 13.10.

  • Answers
    Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

    Related Question

    conversion - How do I convert a text file to the EPUB format?
  • jamesh

    I have a bunch of book length text files I'd really like to read on my EPUB reader (as it happens FBReaderJ). What would be the best route to convert them?

    I have access to Mac OS X and Linux (Ubuntu). Probably happiest with a command line, but would setting for a GUI for batch conversion.

    My criteria for success are really based upon the shortfalls I have found with Calibre

    • must do the whole book
    • at least a guess of what the title/author may be. Minimum the source filename for the title.
    • hygienic with files it uses - tidies up after itself (this is less important)
    • doesn't try to be an all-in-one library manager (again, less important).
    • is lenient in parsing special characters (e.g. < and & characters).

  • Related Answers
  • Hrannar Jonsson

    Happened upon this thread many moons later.

    Just liked to point out there is a command line tool Calibre uses to convert. It's called (surprise, surprise) ebook-convert. See 'ebook-convert -h' or 'ebook-convert dummy.html .epub -h' to see conversion options for converting html to epub.

    Haven't explored it though. I am most curious about --list-recipes (and if it works), it looks as somethings interesting.

  • Peter Mortensen

    I'd say, Calibre is for you, it works on Linux, Mac OS X, and Windows.

    Input Formats: CBZ, CBR, CBC, EPUB, FB2, HTML, LIT, MOBI, ODT, PDF, PRC**, PDB, PML, RB, RTF, TXT

    Output Formats: EPUB, FB2, OEB, LIT, LRF, MOBI, PDB, PML, RB, PDF, TXT

  • Peter Mortensen

    For the Mac OS X and Windows, I have had success with Stanza for Desktop.

    This supports a good range of export formats.

    More importantly, it copes very well with

    • detecting chapters in large text files.
    • unicode, including "significant" characters like < and &.
  • caliban

    There are online tools to convert to epub files.

    Example of such a website here.

  • user46377

    If you have a MacOS X 10.6 machine, try this:

    http://padilicious.com/epub/index.html

    It relies on Automator

  • Werner Donné

    You may want to try ODFToEPub. This is an OpenOffice extension that lets you export a document to ePub.

    http://www.pincette.biz/odftoepub/

  • Josh B

    If you have access to a Windows system, you can try Atlantis Word Processor. It converts not only TXT but also DOC, DOCX, and ODT files to EPUB. Only a few mouse clicks are needed to convert a document to EPUB. Convenient batch conversion is also offered. You can find details here:

    http://www.atlantiswordprocessor.com/en/help/index.php?page=html/ebook.htm

    Or see the main page of its site (there is also a download link).

  • Peter Mortensen

    A bit off-topic:

    Here's a nice website, titled EVERYTHING ABOUT READING ELECTRONIC BOOKS.

    Well, not quite EVERYTHING, but very informative nevertheless :)

    This being a Russian website, they're focused on some of these extraordinary Russian eBook reading programs such as ICE Book Reader Professional, CoolReader 2 (maybe not as sophisticated as ICE but free) and AlReader 2. None of these support the EPUB format though.

    There's also a link to several ePub libraries, which might be of interest for you.