I have a postscript file which contains Type 3 Font.After converting that postscript to pdf using "gs" command ,I am unable to extract the text from pdf file.Is there any possibility to avoid change Type 3 Fonts to some other font, by substituting or some other way ,so that I can copy the text?
-
Why is this question tagged as an iText question? Are you using the name iText as a general term for PDF software? – Bruno Lowagie Oct 15 '16 at 11:17
1 Answers
This is another case of miscomprehension regarding type 3 fonts. The fact that a font is a type 3 font has little to do with whether a PostScript program or PDF file using the font is 'searchab;e' or not.
Fonts in PostScript and PDF have an 'Encoding' which maps the character codes 0-255 to a named procedure in the font. Executing that procedure draws the glyph. The character codes can be anything, but are often (for Latin fonts) chosen to match the ASCII encoding.
PDF has the additional concept of a ToUnicode CMap, additional information which maps a character code in a font to a Unicode code point. PostScript has no such analogue, that's not what PostScript is for (its also not what PDF was originally for, which is why ToUnicode CMaps are a later addition to the PDF standard).
In the absence of a ToUnicode CMap Acrobat uses undocumented heuristics to try and guess what the text is. The obvious one (and the only one we know of) is that it treats the character codes as ASCII.
Now, if your original PostScript program has an encoding that maps the character codes as if they were ASCII< then provided you do not subset the font, the resulting PDF file should also contain ASCII character codes. If you do subset the font then the pdfwrite device will reorder the glyphs and the character codes will no longer be ASCII.
If your original PostScirpt file does not order the glyphs in the font using ASCII character codes then there is nothing you can do other than apply OCR, the information simply is not present.
But forget about altering the font type, not only is it not likely to be possible, it isn't the problem.

- 30,202
- 3
- 34
- 51
-
1Hi @KenS, thanks for your reply .But I just want to know how to replace type 3 fonts in pdf using ghostscript. – prasad Oct 14 '16 at 13:41
-
-
Hi@Kens,If I convert that pdf to text using "pdftotext" command ,it is giving correct content.How ghostscript is handling that internally?.Can't we do anything while converting ps into pdf? – prasad Oct 14 '16 at 13:58
-
I'm baffled by your point. If pdftotext can produce workable text from the PDF, then your original statement ',I am unable to extract the text from pdf file' seems incorrect. I also can't see where **PostScript** fits into this. Perhaps if you could explain more clearly what you are staring from, and what you hope to achieve, we might get further. Example files would probably help, and if appropriate a Ghostscript command line (and version and OS in use) – KenS Oct 14 '16 at 18:30