0

Is there a way to get the list of embedded fonts in a PDF file using the PDFClown library? The aim is to check whether it is a scanned PDF of not, assuming it is a scanned document if it doesn't have embedded fonts.

Thanks in advance

  • Correct me if I am wrong, but even the PDFs that are not scanned may not embed any fonts. Are you sure this is a reliable method? You may have to look for the page-size images instead and distinguish from the backgrounds. – ajeh Sep 09 '16 at 14:07
  • You can simply query each `Page` for its `Resources` and them again for their `FontResources`. Furthermore you should recurse into the resources' `XObjectResources` and `PatternResources` which have their own `Resources`. This will tell you whether there are fonts *defined* (and probably embedded) for use on those pages. Whether they actually are *used* is a different question altogether. – mkl Sep 09 '16 at 14:08
  • Ajeh you're right. I've just found a way to list the fonts and at the same time found a scanned PDF file in my collection that actually contains embedded fonts, which I guess where embedded by default by the PDF tool used to create the file. I'm trying you're solution. – Not_A_SysAdmin Sep 09 '16 at 14:17

1 Answers1

0

UPDATE

I've just found out how to list embedded PDF fonts, even though it didn't help much with my problem. Here's a sample code.

    string pdffile = @"path\to\your\PDF\document.pdf";
    using (File pdf = new File(pdffile))
    {

        Document doc = pdf.Document;
        foreach (Page page in doc.Pages)
        {
            foreach (var a in page.Resources.Fonts.Keys)
            {
                Console.WriteLine(page.Resources.Fonts[a].Name);
            }

        }
    }