adobe acrobat - How to analyze the space usage within pdf document?

07
2014-07
  • ufotds

    I have this 7mb pdf that I made from 65 scanned B/W images. After OCR, the document becomes 32mb.

    I have never seen text taking up so much space. (in theory 25mb should give me 25 million letters uncompressed) Saving in plain text I have about 4KB/page * 65 = +/- 280KB of text.

    Leaves the rest of 32mb for the positioning since I make a searchable image? Unlikely.

    Something seems wrong and I want to have a look at the space taken up by different parts of the pdf, but I can't find any tool that seems to do this.

    EDIT: The issue with the pdf in question has been resolved. The culprit was having searchable image vs searchable image (exact). It must have resampled some of the images which made them a lot bigger. Still interested in an answer to the question though.

  • Answers
  • Darth Android

    The tool you are looking for is the Audit Space Usage tool in Adobe Acrobat. This tool will give you a byte by byte breakdown of which components of your PDF are contributing to the file size.

    Here's a video demonstrating how to find the Audit Space Usage tool. For some reason Adobe has hidden it in Acrobat.

    The feature can be found under File > Save as... > Optimized > Audit space usage.


  • Related Question

    How to do OCR on a PDF document?
  • Questioner

    Possible Duplicate:
    How to extract text with OCR from a PDF on Linux?

    I have a few documents in English and Hebrew that I scanned in and converted to PDF format.

    Is there some free or cheap utility that can process a scanned PDF and do OCR, at least in English, preferably also in Hebrew?

    Thanks!


  • Related Answers
  • Seasoned Advice (cooking)

    I found a list of free OCR software for Windows.

    1. FreeOCR
    2. Tesseract
    3. WeOcr Tesseract Web Interface
    4. GOCR
    5. Windows GUI for GOCR
    6. OCR Desktop
    7. Simple OCR
    8. TopOCR

    However, these programs need an image input, not a PDF input. For this, try a PDF-to-JPG converter.