linux - Automatic resolution when converting PDF to images

07
2014-07
  • Cornelius

    I have some scanned PDF files that I want to process using ScanTailor. To do that I need to extract images (as TIFF). I have been using GhostScript as follows:

    gs -sDEVICE=tiffgray -r400x400 -dNOPAUSE -dBATCH -sOutputFile="file0000.tiff" "input.pdf"
    

    The problem is I don't know what is the resolution of the original images in PDF. Is there any way to make GhostScript to adapt its resolution based on the images in the PDF file? Or is there any other free Linux software that can do that?

    Adobe Acrobat does that:

    Colorspace/Resolution Specifies a color space and resolution for the output file. You can let Acrobat determine these settings automatically.

  • Answers
  • ojs

    The pdfimages from poppler-utils extracts images from pdf files, it saves them as PBM for monochrome images and PPM for non-monochrome images but you can make it output jpg instead. If that does not suite you then you can use pdfimages -list to get a list of images and their information including resolutions.


  • Related Question

    linux - Converting a PDF to black & white with ghostscript
  • niklasfi

    Similarly to this question:

    Convert a PDF to greyscale on the command line in FLOSS?

    I have a PDF-document and want to convert it to pure black and white. So I want to discard halftones. To convert to grayscale with ghostscript I can use this command:

    gs \
     -sOutputFile=output.PDF \
     -sDEVICE=pdfwrite \
     -sColorConversionStrategy=Gray \
     -dProcessColorModel=/DeviceGray \
     -dCompatibilityLevel=1.4 \
      input.PDF < /dev/null
    

    What do I have to change to get monochrome e.g. only the colors black and white and no halftones?


  • Related Answers
  • Surge

    The last suggestion indeed only converts to grayscale and then only works if the underlying doc uses setrgbcolor. This did not work for me, since I had a doc, that used setcolor.

    I had success with redefining setcolor to always set the color to 0,0,0:

    gs -o <output-file.pdf> -sDEVICE=pdfwrite \
    -c "/osetcolor {/setcolor} bind def /setcolor {pop [0 0 0] osetcolor} def" \
    -f <input-file.ps>
    

    It has been 15+ years since I did any PostScript hacking, so the above may be lame, incorrect or even accidental - if you know how to do better, please suggest.

  • Community

    I am not sure if the following suggestion will work... but it may be worth to try out:

    1. convert the PDF to PostScript using the simple pdf2ps utility
    2. convert that PostScript back to PDF while using a re-defined /setrgbcolor PostScript operator

    These are the commands:

    First

      pdf2ps color.pdf color.ps
    

    This gives you color.ps as output.

    Second

    gs \
    -o bw-from-color.pdf \
    -sDEVICE=pdfwrite \
    -c "/setrgbcolor{0 mul 3 1 roll 0 mul 3 1 roll 0 mul 3 1 roll 0 mul add add setgray}def" \
    -f color.ps
    
  • o-town

    It's not ghostscript, but with imagemagick this is quite simple:

     convert -monochrome input.pdf output.pdf
    
  • Ed L

    This looks like it would work:

    1) Convert the file to monochrome with gs

    gs -sDEVICE=psmono \
      -dNOPAUSE -dBATCH -dSAFER \
      -sOutputFile=combined.ps \
      first.pdf \
      second.ps \
      third.eps [...]
    

    3) Convert the Postscript file back to a PDF with ps2pdf or gs

    (credit to: http://www.linuxjournal.com/content/tech-tip-using-ghostscript-convert-and-combine-files)

  • Tarun Kumar

    for gray scale PDF:

    By using GhostScript

    IN PHP code, use this script

    exec("'gs' '-sOutputFile=outputfilename.pdf' '-sDEVICE=pdfwrite' '-sColorConversionStrategy=Gray' '-dProcessColorModel=/DeviceGray' '-dCompatibilityLevel=1.4'  'inputfilename.pdf'",$output);
    

    usefull url
    http://www.linuxjournal.com/content/tech-tip-using-ghostscript-convert-and-combine-files

  • Tarun Kumar

    For pure black and white PDF, you need to convert it into ps format then into PDF for postscript:

    exec(" gs -sDEVICE=psmono  -dNOPAUSE -dBATCH -dSAFER  -sOutputFile=combined.ps  $pdf");
    

    postscript to PDF -> black and white

    exec(" gs -sDEVICE=pdfwrite   -dNOPAUSE -dBATCH -dSAFER  -sOutputFile=file_pdf.pdf  filename.ps");