How can I extract fonts from a PDF file?

23
2014-04
  • Questioner

    Is there a way to extract fonts from PDF files?

    I know that usually embedded fonts in PDF files are only subsets of the fonts. Anyway, is there a way to do this?

  • Answers
  • techie007

    Download FontForge for Windows

    http://www.geocities.jp/meir000/fontforge/

    Or from the official FontForge location:

    www.fontforge.org

    Once FontForge is downloaded, start it

    On the “Open Font” screen, go down to where it says “Filter” and change it to “Extract from PDF”. Select your PDF and a “Pick a font” window will open. Select the font you want to extract and click OK.

    A window with a display of the font will show up. It’s not quite ready to turn into a TTF yet. Here’s how to prepare it:

    Go to the Encoding menu and select “Compact”. This will cause FontForge to remove all characters that are not defined in the embedded font. Beware though, sometimes when a font is embedded into a PDF it will only contain characters used. So, if the PDF file that you are trying to extract from does not contain the letter “P”, then that letter will not show up in FontForge. Check to make sure all the characters you need are displayed and then head over to the Element menu.

    Click on Font Info. You can update the Fontname, Family Name, and most importantly, “Name for Humans”. This field is what the font will display as in your editing program. The font name is usually a little garbled when you extract it, so just make it something readable. If there is a copyright notice displayed at the bottom, you should probably stop what you are doing since that usually means the font should be purchased.

    If there’s no copyright, click on “OK”. Then go to File > Generate Fonts.

    Select the type of font you want to save as (Usually TrueType is best), and click on Save. You may encounter some messages about Non-standard Em size and Bad Private Dictionary errors. Just click on Save and you should be OK.

    Then, find your font file and open it up to make sure that it displays properly.

    If it does, then all is well. Close FontForge and enjoy your properly displayed font.

  • Don Salva

    Though bear in mind: Some documents with custom fonts are made as PDFs just for the purpose that those fonts should not be available to everybody.

    Meaning they are copyrighted to their respective owner. Which in turn means if you plan to use said copyrighted font you can get in a lot of trouble.

    Yes, not every font is free. There are fonts that cost hundreds of buck too.


  • Related Question

    How can I extract text from a table in a PDF file?
  • Nathan Fellman

    I am trying to implement an algorithm described in an academic paper, which I have in PDF format. The algorithm includes a table of 256 entries that I want to copy to my implementation. However, I can't seem to copy the table as text that I can manipulate. I can only copy it as an image.

    How can I extract the table easily without typing it in?


  • Related Answers
  • Ivo Flipse

    PDF2Table

    This gives it out to XML I think.

    If we surf the web we can find PDF files in heaps. Once technical details of an amazing five mega pixel digital camera, once a statistic about the last two years incomes of an enterprise, and once a brilliant crime novel of Sir Arthur Conan Doyle is saved in a PDF file. The widespread use of this file format takes the focus on the question of how to reuse the data in such a file. Many things are already done in this area. For example, there are several tools that convert PDF-files to other formats.

    My work focuses only on the extraction of table information from PDF-files. I searched for tools that extract basic information from PDF-files. I found a tool named pdf2html which also returns data in XML format. To access this XML output I used the JDOM archive.

    I developed several heuristics for table detection and decomposition. These heuristics work pretty good on lucid tables (without spanning columns or rows) and fairly good on complex tables (with spanning rows or columns).

    Sourceforge link

  • Toby Allen

    Your problem might be that it was pasted into the pdf as an image by the origional author. If this is the case (you could find out by seeing if other text in the document will copy as text) your only options are probably to copy it by hand (hope you can touch type) or use OCR software that comes with scanners.

  • Synetech

    I haven't tried this, but the pdf2table project, might help.

  • Matthew Lock

    The non-free application PDF2XL and the free PDF Mechanic can both extract tabular data to CSV and Excel often perfectly depending on the exact formatting of the table.

  • Matt Jans

    One option seems to be to save the document (or maybe just the page with the table you want) as an xml file. I just did this in Adobe Acrobrat Pro by saving as "XML Spreadsheet 2003." This retained the tabular format in the resulting xml file (viewable in Excel). The only "imperfection" is that it considers each literal row in the table as a row in the Excel file. So if any text breaks across rows (e.g., long names), then it will show up as two rows in excel. For a small table, that's pretty minor cleanup.

    Other than that, it seems like this process could be automated.