conversion - What are the possible tools to convert from pdf to epub?
2014-07
I know similar questions have been asked before, but before marking it as duplicate, let me explain.
I just bought a .pdf
ebook online and would like to have it as a .epub
instead. I usually use calibre
(v1.0.0) to this sort of tasks with great success. This time, a large proportions of lines just seems to get messed up during the conversion.
Jérôme disait aimer le rouge. Sa marotte
FRQVLVWDLW VXUWRXW ¡ O#HQOHYHU 'ªJUDIHU OD
dentelle était un geste qu'il effectuait avec la
Even if you don't speak french, you'll notice that the middle line is garbage. And it's not only a useless line, it does replace actual content.
The calibre ebook viewer displays the initial pdf
with the garbage, while it displays just fine with my default pdf viewer. I tried converting to mobi
, txt
, mkd
, to no avail.
I tried pdftotext
, and the online tool http://www.zamzar.com/ and got the same output.
I then converted the pdf
to .pbm
files and tried running gocr
and ocrad
on it. The OCR results were quite interesting, but not good enough to be used as-is.
Jérôme _sȧit aimer le rouge. Sa marotte
consistait surTout à l'enlever. Dégrafer la
dentelle était un geste qu_l effectuait avec la
Would you have any idea of other tools that could help in the process or options to fine-tune calibre or OCR programs ?
Note: I'm running ubuntu 13.10.
I have a bunch of book length text files I'd really like to read on my EPUB reader (as it happens FBReaderJ). What would be the best route to convert them?
I have access to Mac OS X and Linux (Ubuntu). Probably happiest with a command line, but would setting for a GUI for batch conversion.
My criteria for success are really based upon the shortfalls I have found with Calibre
- must do the whole book
- at least a guess of what the title/author may be. Minimum the source filename for the title.
- hygienic with files it uses - tidies up after itself (this is less important)
- doesn't try to be an all-in-one library manager (again, less important).
- is lenient in parsing special characters (e.g. < and & characters).
Happened upon this thread many moons later.
Just liked to point out there is a command line tool Calibre uses to convert. It's called (surprise, surprise) ebook-convert. See 'ebook-convert -h' or 'ebook-convert dummy.html .epub -h' to see conversion options for converting html to epub.
Haven't explored it though. I am most curious about --list-recipes (and if it works), it looks as somethings interesting.
I'd say, Calibre is for you, it works on Linux, Mac OS X, and Windows.
Input Formats: CBZ, CBR, CBC, EPUB, FB2, HTML, LIT, MOBI, ODT, PDF, PRC**, PDB, PML, RB, RTF, TXT
Output Formats: EPUB, FB2, OEB, LIT, LRF, MOBI, PDB, PML, RB, PDF, TXT
For the Mac OS X and Windows, I have had success with Stanza for Desktop.
This supports a good range of export formats.
More importantly, it copes very well with
- detecting chapters in large text files.
- unicode, including "significant" characters like < and &.
There are online tools to convert to epub files.
Example of such a website here.
If you have a MacOS X 10.6 machine, try this:
http://padilicious.com/epub/index.html
It relies on Automator
You may want to try ODFToEPub. This is an OpenOffice extension that lets you export a document to ePub.
If you have access to a Windows system, you can try Atlantis Word Processor. It converts not only TXT but also DOC, DOCX, and ODT files to EPUB. Only a few mouse clicks are needed to convert a document to EPUB. Convenient batch conversion is also offered. You can find details here:
http://www.atlantiswordprocessor.com/en/help/index.php?page=html/ebook.htm
Or see the main page of its site (there is also a download link).
A bit off-topic:
Here's a nice website, titled EVERYTHING ABOUT READING ELECTRONIC BOOKS.
Well, not quite EVERYTHING, but very informative nevertheless :)
This being a Russian website, they're focused on some of these extraordinary Russian eBook reading programs such as ICE Book Reader Professional, CoolReader 2 (maybe not as sophisticated as ICE but free) and AlReader 2. None of these support the EPUB format though.
There's also a link to several ePub libraries, which might be of interest for you.