0

I received a PDF file that uses unusual fonts.

non standard fonts

The fonts look fine to the human eye,

fonts look fine

but if I try to cut-past them, I get a string of '???'

Is it possible to replace the PDF document's defined fonts with normal fonts (e.g., on Foxit Phantom PDF editor)?

boardrider
  • 5,882
  • 7
  • 49
  • 86
  • No - for various reasons. First off, the *font* is not a problem, but its (lack of) encoding is. If you replace the font with another one, you'd still get those question marks because you did not replace its *text*. Second, the text may be using characters that are not available in another font. – Jongware Aug 21 '16 at 12:28
  • Thanks @Rad. I fail to comprehend your second point, though: as you can see from the screenshots, the text is just plain English, namely regular Latin characters. – boardrider Aug 21 '16 at 16:31
  • It may be possible to recreate the encoding (maybe) by using Acrobat Pro, and by having the original font, and a new font installed on your machine. In Acrobat, you would use an appropriate Preflight profile. – Max Wyss Aug 21 '16 at 17:01
  • 1
    A font may contain characters that look like your average plain English text and still be special. Custom ligatures, for example. – Jongware Aug 21 '16 at 18:05
  • 2
    What makes you think copy-pasting text from a PDF is going to work? A PDF does not contain "text" in the same way a text document or even word document does, so if the PDF was not generated with a "preserve the ability to copy-paste" option turned on, or with the "forbid copy-paste" option turned on, then you can't reliably, or even at all, copy paste. With that said, did you remember to paste into a unicode document, not "a notepad.exe text file" or something? – Mike 'Pomax' Kamermans Aug 21 '16 at 18:27

1 Answers1

2

This may be possible, e.g. with PitStop Pro from Enfocus. However, as others indicated in the comments, it is possible that the fonts in the pdf and the pdf itself have had all information to make this possible removed.

Some more detail about this maybe:

The encoding in the PDF could tell software which character is to be shown, and then that character would be selected from the font for display, but it is also possible to create a pdf so it only says 'show glyph number 3 in of the embedded font'. That is what the 'Identity-H' encoding you see in the summary does.

Note that the word glyph and not 'character' is specifically used when talking about the individual 'drawings' that make up a font to indicate that these things are only 'random' drawings until some information is added in the font to indicate which letter (or other character, like a number) they represent.

E.g. for the character 'lower-case-a', the font you currently look at has this glyph:

a

but other fonts will have something that may look completely different. Only because we have learned to read these different images as the letter lower-case-a do we think they are/represent 'the same letter'.

If this information is not present in the PDF, as is your case, it is still possible that this information can be gotten from the font included in the pdf: a font on your computer needs some way to allow a program to select the right glyph if it wants to display 'lower-case-a'. However, if the pdf is set up to simply say 'show glyph number 3 in of the embedded font', this information isn't necessary anymore, and can be removed from the font before the font is put inside the pdf. This is done either to make the pdf smaller, or to prevent people from copying the text, e.g. of copyrighted works.

In this case, only OCR can help. I think Adobe Acrobat (the full version, not Adobe Reader) has added exactly that in one of the latest versions; however this means it is trying to guess the letter from the 'image' shown, so this may make mistakes.

Legolas
  • 894
  • 1
  • 11
  • 25