How to make a searchable PDF document from a scan AND a source Word document?

07
2013-09
  • Evengard

    Well, I have a scanned PDF with some slightly changes made by hand and a source file. I wish to make a PDF, which would be searchable (based on the text from the source, the changes would remain as they are).

    I am searching a free (and even better - portable) software which would allow me to somehow "combine" the images from a scan and the text from the source DOC file. So it SEEMS like the image is selectable and searchable.

    UPD: use case: I have the source DOC file. Then, I printed it. Then, I made some notes by hand on the sheet with the printed document. Then - I scanned it. What I want - is making a PDF with the scanned images, but at the same time the text on this image should be selectable and searchable. Like the "OCR" feature of the Acrobat, but without doing actual OCR - cause I have the oiginal source text - and with an freeware and portable software.

  • Answers
  • wizlog

    Ehow tech posted three methods of converting Word documents to PDF (aka Portable Document Format) two of which I am sure work fine, not sure about Zamar.

    1. Go to the Zamzar website. Zamzar provides free conversion to and from different formats. This option works well if you don't need to convert Word documents to PDF frequently.

    2. Purchase and install Adobe Acrobat. At the time of publication, Adobe Acrobat Standard was selling for approximately $300 (now only $139). A new "Save as PDF" option is added to [Microsoft Word] after installing Acrobat. Most libraries, schools, Sony PCs, Work lapotps (the ones provided by your company) already have Adobe Acrobat installed.

    3. Microsoft Office Add-in: Microsoft Save as PDF or XPS This add-in allows you to export and save to the PDF and XPS formats in eight 2007 Microsoft Office programs.


  • Related Question

    editing - Highlighting words in a pdf document
  • Phenom

    I'm trying to highlight words in a pdf document. However, behind the words, there is written in big letters "DO NOT COPY" all throughout the document. Sometimes when I try to highlight words it is those big letters that will get selected instead. How can I highlight the words I want instead of those big letters in the background?


  • Related Answers
  • ashishsony

    Your pdf document can be protected for making it uncopiable.. so that could be a problem. Secondly that text can be a watermark too...which can be removed from a pdf creator software like Adobe Acrobat proffessional or NitroPDF

    for restrictions removal you can use services like http://freemypdf.com/... but removing restriction from a PDF can be illegal as also warned by this site.. so it depends upon the content of the pdf.

    Good Luck..

  • Travis

    This would be good to try:

    1. Open the PDF
    2. Select All, Copy
    3. Paste into a word processor such as Word
    4. Use your the built-in Find & Replace feature to find "DO NOT COPY" and replace it with nothing.
  • pavium

    The big letters which say "DO NOT COPY" were probably added to stop you selecting text and copying it to the clipboard.

    This would also make it difficult to select text and highlight it.

  • blahdiblah

    It may be enough to start highlighting from a different point.

    Try highlighting from the end of the passage instead of the beginning, or from slightly before the text you're interested in.

  • TataBlack

    It appears that you want to remove the watermark while keeping the file in PDF format.

    I found a file on the Internet with the same "Do not copy" background image and, though it doesn't keep you from selecting/highlighting text, indeed it may make it a bit difficult at times.

    Not to reinvent the wheel, here are presented three solutions (you still have a PDF in the end) and a workaround (you have a series of images):

    1. from the original document, re-create the PDF without the watermark (yes, well, I don't think it applies, doesn't it?);
    2. install Adobe Acrobat (not the Reader), even in trial version, and use it to remove the watermark;
    3. convert the PDF to a Word file, remove the watermark, and then export it again as PDF (the outcome really depends on the formatting and content of your PDF file);
    4. convert the PDF to images, and delete the watermark by hand (may be a bit of work).

    Which one is better depends, probably, on the number of files you want to remove the watermark from, and whether this is a contingent need or something you'll be doing day after day. If it's just this once, then I suggest trying the Adobe Acrobat solution mentioned in the linked blog.

  • Peter Cordes

    If you can't copy because it's "encrypted" and the permissions don't let you, then just use a PDF password remover program. There aren't any easy-to-use free ones that I know of, though. Even most open source PDF programs enforce the no-copy, no-printing nonsense. (although pdftotext doesn't care, and lets you dump the PDF to text).

    For my own use, I modified the source of pdftk to not check the restrictions. Recent updates to the library its based on made me re-do that change, which I haven't gotten around to getting working yet, or I'd post the patch.

  • harrymc

    You can use a free PDF reader that knows how to extract text:

    PDF-XChange Viewer :
    Can extract text from a PDF page/File.

    Foxit Reader :
    Can convert the whole PDF document into a simple text file.

    Both these readers are fast and easy to use.