I received a PDF file that uses unusual fonts.
The fonts look fine to the human eye,
but if I try to cut-past
them, I get a string of '???'
I received a PDF file that uses unusual fonts.
The fonts look fine to the human eye,
but if I try to cut-past
them, I get a string of '???'
This may be possible, e.g. with PitStop Pro from Enfocus. However, as others indicated in the comments, it is possible that the fonts in the pdf and the pdf itself have had all information to make this possible removed.
Some more detail about this maybe:
The encoding in the PDF could tell software which character is to be shown, and then that character would be selected from the font for display, but it is also possible to create a pdf so it only says 'show glyph number 3 in of the embedded font'. That is what the 'Identity-H' encoding you see in the summary does.
Note that the word glyph and not 'character' is specifically used when talking about the individual 'drawings' that make up a font to indicate that these things are only 'random' drawings until some information is added in the font to indicate which letter (or other character, like a number) they represent.
E.g. for the character 'lower-case-a', the font you currently look at has this glyph:
a
but other fonts will have something that may look completely different. Only because we have learned to read these different images as the letter lower-case-a do we think they are/represent 'the same letter'.
If this information is not present in the PDF, as is your case, it is still possible that this information can be gotten from the font included in the pdf: a font on your computer needs some way to allow a program to select the right glyph if it wants to display 'lower-case-a'. However, if the pdf is set up to simply say 'show glyph number 3 in of the embedded font', this information isn't necessary anymore, and can be removed from the font before the font is put inside the pdf. This is done either to make the pdf smaller, or to prevent people from copying the text, e.g. of copyrighted works.
In this case, only OCR can help. I think Adobe Acrobat (the full version, not Adobe Reader) has added exactly that in one of the latest versions; however this means it is trying to guess the letter from the 'image' shown, so this may make mistakes.