0

I am trying to extract images from a PDF file using PDFsharp. The test file I ran the code on shows the filter type being /JBIG2. I would like help in understanding how to decode this image and save it, if it is at all possible using PDFSharp.

The code I'm using to extract the image and then save it is as follows:

const string filename = "../../../test.pdf";            
PdfDocument document = PdfReader.Open(filename);
int imageCount = 0;

foreach (PdfPage page in document.Pages) { // Iterate pages
  // Get resources dictionary
  PdfDictionary resources = page.Elements.GetDictionary("/Resources");

  if (resources != null) {
    // Get external objects dictionary
    PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");

    if (xObjects != null) {
      ICollection<PdfItem> items = xObjects.Elements.Values;

      foreach (PdfItem item in items) { // Iterate references to external objects
        PdfReference reference = item as PdfReference;

        if (reference != null) {
          PdfDictionary xObject = reference.Value as PdfDictionary;

          // Is external object an image?
          if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image") {
            ExportImage(xObject, ref imageCount);
          }
        }
      }
    }
  }
}

static void ExportImage(PdfDictionary image, ref int count) {
   string filter = image.Elements.GetName("/Filter");

   switch (filter) {
     case "/DCTDecode":
       ExportJpegImage(image, ref count);
       break;
     case "/FlateDecode":
       ExportAsPngImage(image, ref count);
       break;
   }  
}

static void ExportJpegImage(PdfDictionary image, ref int count) {
  // Fortunately, JPEG has native support in PDF and exporting an image is just writing the stream to a file.
  byte[] stream = image.Stream.Value;
  FileStream fs = new FileStream(
    String.Format("Image{0}.jpeg", count++), FileMode.Create, FileAccess.Write
  );
  BinaryWriter bw = new BinaryWriter(fs);
  bw.Write(stream);
  bw.Close();
}

In the above, I am getting the filter type as /JBIG2, for which I do have support. The above code is used from PDFSharp: Export Images Sample

Agi Hammerthief
  • 2,114
  • 1
  • 22
  • 38
  • Please post the code you're using for the extraction process and, if possible, the PDF in question (or a link thereto). – Agi Hammerthief Mar 12 '19 at 07:40
  • Edited the main summary with the code. It will be difficult to share the file but I can add that the file is a pdf generated when I scanned a document and emailed it to myself. @AgiHammerthief – bluemoonstudios Mar 12 '19 at 08:46
  • To answer your question I would have to read the PDF Reference manuals from Adobe, but I don't have time for that now. Maybe you can answer your question on your own if you check out the reference. – I liked the old Stack Overflow Mar 13 '19 at 06:53

1 Answers1

-1

JBIG2 is most widely used in PDF, however outside of PDF is a different story. Although .jbig2 is a raster image format, support for it is quite sparse in terms of image viewers. Your best bet would be to export it as a CCITT4 compressed TIFF as Acrobat does.

JosephA
  • 1,187
  • 3
  • 13
  • 27