linux - Automatic resolution when converting PDF to images

07
2014-07

Cornelius

I have some scanned PDF files that I want to process using ScanTailor. To do that I need to extract images (as TIFF). I have been using GhostScript as follows:

gs -sDEVICE=tiffgray -r400x400 -dNOPAUSE -dBATCH -sOutputFile="file0000.tiff" "input.pdf"

The problem is I don't know what is the resolution of the original images in PDF. Is there any way to make GhostScript to adapt its resolution based on the images in the PDF file? Or is there any other free Linux software that can do that?

Adobe Acrobat does that:

Colorspace/Resolution Specifies a color space and resolution for the output file. You can let Acrobat determine these settings automatically.

Answers

ojs

The pdfimages from poppler-utils extracts images from pdf files, it saves them as PBM for monochrome images and PPM for non-monochrome images but you can make it output jpg instead. If that does not suite you then you can use pdfimages -list to get a list of images and their information including resolutions.

Related Answers

Surge

The last suggestion indeed only converts to grayscale and then only works if the underlying doc uses setrgbcolor. This did not work for me, since I had a doc, that used setcolor.

I had success with redefining setcolor to always set the color to 0,0,0:

gs -o <output-file.pdf> -sDEVICE=pdfwrite \
-c "/osetcolor {/setcolor} bind def /setcolor {pop [0 0 0] osetcolor} def" \
-f <input-file.ps>

It has been 15+ years since I did any PostScript hacking, so the above may be lame, incorrect or even accidental - if you know how to do better, please suggest.

Community

I am not sure if the following suggestion will work... but it may be worth to try out:

convert the PDF to PostScript using the simple pdf2ps utility
convert that PostScript back to PDF while using a re-defined /setrgbcolor PostScript operator

These are the commands:

First

  pdf2ps color.pdf color.ps

This gives you color.ps as output.

Second

gs \
-o bw-from-color.pdf \
-sDEVICE=pdfwrite \
-c "/setrgbcolor{0 mul 3 1 roll 0 mul 3 1 roll 0 mul 3 1 roll 0 mul add add setgray}def" \
-f color.ps

o-town

It's not ghostscript, but with imagemagick this is quite simple:

 convert -monochrome input.pdf output.pdf

Ed L

This looks like it would work:

1) Convert the file to monochrome with gs

gs -sDEVICE=psmono \
  -dNOPAUSE -dBATCH -dSAFER \
  -sOutputFile=combined.ps \
  first.pdf \
  second.ps \
  third.eps [...]

3) Convert the Postscript file back to a PDF with ps2pdf or gs

(credit to: http://www.linuxjournal.com/content/tech-tip-using-ghostscript-convert-and-combine-files)

Tarun Kumar

for gray scale PDF:

By using GhostScript

IN PHP code, use this script

exec("'gs' '-sOutputFile=outputfilename.pdf' '-sDEVICE=pdfwrite' '-sColorConversionStrategy=Gray' '-dProcessColorModel=/DeviceGray' '-dCompatibilityLevel=1.4'  'inputfilename.pdf'",$output);

usefull url
http://www.linuxjournal.com/content/tech-tip-using-ghostscript-convert-and-combine-files

Tarun Kumar

For pure black and white PDF, you need to convert it into ps format then into PDF for postscript:

exec(" gs -sDEVICE=psmono  -dNOPAUSE -dBATCH -dSAFER  -sOutputFile=combined.ps  $pdf");

postscript to PDF -> black and white

exec(" gs -sDEVICE=pdfwrite   -dNOPAUSE -dBATCH -dSAFER  -sOutputFile=file_pdf.pdf  filename.ps");

Home

linux - Automatic resolution when converting PDF to images