linux - Automatic resolution when converting PDF to images
2014-07
I have some scanned PDF files that I want to process using ScanTailor. To do that I need to extract images (as TIFF). I have been using GhostScript as follows:
gs -sDEVICE=tiffgray -r400x400 -dNOPAUSE -dBATCH -sOutputFile="file0000.tiff" "input.pdf"
The problem is I don't know what is the resolution of the original images in PDF. Is there any way to make GhostScript to adapt its resolution based on the images in the PDF file? Or is there any other free Linux software that can do that?
Adobe Acrobat does that:
Colorspace/Resolution Specifies a color space and resolution for the output file. You can let Acrobat determine these settings automatically.
The pdfimages from poppler-utils extracts images from pdf files, it saves them as PBM for monochrome images and PPM for non-monochrome images but you can make it output jpg instead. If that does not suite you then you can use pdfimages -list to get a list of images and their information including resolutions.
Similarly to this question:
I have a PDF-document and want to convert it to pure black and white. So I want to discard halftones. To convert to grayscale with ghostscript I can use this command:
gs \
-sOutputFile=output.PDF \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 \
input.PDF < /dev/null
What do I have to change to get monochrome e.g. only the colors black and white and no halftones?
The last suggestion indeed only converts to grayscale and then only works if the underlying doc uses setrgbcolor. This did not work for me, since I had a doc, that used setcolor.
I had success with redefining setcolor to always set the color to 0,0,0:
gs -o <output-file.pdf> -sDEVICE=pdfwrite \
-c "/osetcolor {/setcolor} bind def /setcolor {pop [0 0 0] osetcolor} def" \
-f <input-file.ps>
It has been 15+ years since I did any PostScript hacking, so the above may be lame, incorrect or even accidental - if you know how to do better, please suggest.
I am not sure if the following suggestion will work... but it may be worth to try out:
- convert the PDF to PostScript using the simple
pdf2ps
utility - convert that PostScript back to PDF while using a re-defined
/setrgbcolor
PostScript operator
These are the commands:
First
pdf2ps color.pdf color.ps
This gives you color.ps
as output.
Second
gs \
-o bw-from-color.pdf \
-sDEVICE=pdfwrite \
-c "/setrgbcolor{0 mul 3 1 roll 0 mul 3 1 roll 0 mul 3 1 roll 0 mul add add setgray}def" \
-f color.ps
It's not ghostscript, but with imagemagick this is quite simple:
convert -monochrome input.pdf output.pdf
This looks like it would work:
1) Convert the file to monochrome with gs
gs -sDEVICE=psmono \
-dNOPAUSE -dBATCH -dSAFER \
-sOutputFile=combined.ps \
first.pdf \
second.ps \
third.eps [...]
3) Convert the Postscript file back to a PDF with ps2pdf
or gs
(credit to: http://www.linuxjournal.com/content/tech-tip-using-ghostscript-convert-and-combine-files)
for gray scale PDF:
By using GhostScript
IN PHP code, use this script
exec("'gs' '-sOutputFile=outputfilename.pdf' '-sDEVICE=pdfwrite' '-sColorConversionStrategy=Gray' '-dProcessColorModel=/DeviceGray' '-dCompatibilityLevel=1.4' 'inputfilename.pdf'",$output);
usefull url
http://www.linuxjournal.com/content/tech-tip-using-ghostscript-convert-and-combine-files
For pure black and white PDF, you need to convert it into ps format then into PDF for postscript:
exec(" gs -sDEVICE=psmono -dNOPAUSE -dBATCH -dSAFER -sOutputFile=combined.ps $pdf");
postscript to PDF -> black and white
exec(" gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -sOutputFile=file_pdf.pdf filename.ps");