utf 8 - Convert pdf contains utf-8 to word or text files?

07
2014-07
  • TheKing AlaaSy

    I have a pdf file that contain arabic text I need to convert this pdf to word for editing but when it converted the characters displayed in unreadable format

    any suggestion that can help me will be appreciated

  • Answers
    Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

    Related Question

    images - How can I batch convert SVG files containing text to PDF files (specifically on CentOS 5.3 x86_64)?
  • Christopher Bottoms

    Possible Duplicate:
    How do I convert an SVG to a PDF on Linux

    I would like to programatically convert SVG files to PDF files. However, the SVG files contain text that must be searchable in the generated PDF files. Also, it has to work on Red Hat Enterprise Linux 5.3 or CentOS 5.3 for the x86_64 architecture. It would be nice if it were Open Source or at least not very expensive.

    Here is what I've tried. All of these, except Batik, work fine on Debian Lenny.

    Inkscape
    I can get it installed using autopackages from http://inkscape.modevia.com/ap, but when I use it from the command line, the text is not searchable.

    Batik rasterizer [sic]
    When it converts SVG files to PDF files, the text is no longer searchable.

    svg2pdf
    The source for this and several of its dependencies are available to download. I have been trying to get it to compile on CentOS, but haven't had success yet. I found a precompiled version for Debian x86_64, but it doesn't work on CentOS.

    rsvg-convert
    Generated PDF isn't searchable on CentOS 5.3. Perhaps installing a newer version of cairo would help. Thanks to DaveParillo for mentioning rsvg-convert (on superuser).

    SOLUTION (but perhaps some of the above will still be useful to the reader)
    princeXML
    It works fine on CentOS when installed from source. For some reason it doesn't work when installed from the .rpm. Thanks Erik Dahlström! (provided solution that worked for my case on stackoverflow)

    Cross posted on stackoverflow


  • Related Answers
  • DaveParillo

    most tools out there are (like batik or imagemagick) are going to turn your vector data into a raster map.

    I would try rsvg-convert. It uses cairo as a backend, so you may have the same compile problems you're having with svg2pdf.

  • Kurt Pfeifle

    One other (so far very little known) alternative is GhostPDL's gsvg (on Windows: gsvg.exe). GhostPDL is the sister application to Ghostscript (currently being merged into one single repostory at http://svn.ghostscript.com/ghostpdl/). GhostPDL is for SVG, XPS and PCL processing, similar to what Ghostscript is for PostScript and PDF processing. Here goes:

    gsvg.exe ^
       -dBATCH ^
       -dNOPAUSE ^
       -dSAFER ^
       -sDEVICE=pdfwrite ^
       -sOutputFile=my.pdf ^
       [...more options you may want/need...] ^
       c:/path/to/my.svg