3

We're looking to convert images in bulk to PDF, programmatically. So far it looks like we will be using iTextSharp but we have an issue with JPG images with clipping path. We are using the following code in our tests:

using (FileStream fs = new FileStream(output, FileMode.Create, FileAccess.Write, FileShare.None))
{
    using (Document doc = new Document())
    {
        using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
        {
            doc.Open();
            iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(source);

            image.SetAbsolutePosition(0, 0);
            doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, image.Width, image.Height, 0));
            doc.NewPage();

            writer.DirectContent.AddImage(image,false); 

            doc.Close();
        }
    }
}

Clipping path in JPG images seems to just be discarded. Is there a way to preserve the clipping path? Also when calling AddImage there is an option for InlineImage, anyone knows what this does?

John Saunders
  • 160,644
  • 26
  • 247
  • 397
Lars Thorén
  • 453
  • 2
  • 7
  • 22
  • Don't use inline images: using inline images means that the images are stored in the content stream of the PDF. This can only be used for images with a size of 4 KB or less. Larger inline images will be forbidden in PDF 2.0. Furthermore: iText copies the bytes of a JPG straight into the PDF. Not a single byte is changed. If you say that your JPGs have clipping paths (I've never heard of such a thing) and you don't see that feature in the PDF, you are being confronted with a limitation inherent to PDF, not to iText (iText doesn't even look at the JPG). – Bruno Lowagie May 29 '15 at 07:35
  • I do see one error in your code: the page size for the first image will always be wrong. It will be A4 instead of the size of the image. You need to create the `Document` object using the size of the first image you encounter. – Bruno Lowagie May 29 '15 at 07:36
  • I have edited your title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". – John Saunders May 29 '15 at 07:38
  • @BrunoLowagie Regarding the error you pointed out we have not noticed it and we have ran test for hundreds of images and the size has been correct for each of them. – Lars Thorén May 29 '15 at 07:41
  • @LarsThorén OK, I've posted an answer with some more clarifications. Maybe that helps. (As you may have noticed, I'm the original author of iText.) – Bruno Lowagie May 29 '15 at 07:42

1 Answers1

5

iText copies the bytes of a JPG straight into the PDF. Not a single byte is changed. If you say that your JPGs have clipping paths (I've never heard of such a thing) and you don't see that feature in the PDF, you are being confronted with a limitation inherent to PDF, not to iText. iText doesn't even look at the JPG bytes: it just creates a PDF stream object with the filter DCTDecode.

You will have to apply the clipping path before adding the image to the PDF. As you may know, PDF doesn't support PNGs and PNG supports transparency. When iText encounters a transparent PNG, it processes the PNG. It creates two images: one opaque image using /FlateDecode and one monochrome image using /FlateDecode. The opaque image is added with the monochrome image as its mask to obtain transparency. I guess you'll have to preprocess your JPG in a similar way.

About inline images:

Don't use inline images: using inline images means that the images are stored in the content stream of the PDF as opposed to being stored as an Image XObject (which is the optimal way of storing images in a PDF). Inline images can only be used for images with a size of 4 KB or less. Larger inline images will be forbidden in PDF 2.0.

Extra remark:

I think I see a problem in your code. You are creating a document with page size A4:

Document doc = new Document()

A4 is the default size when you don't pass a parameter to the Document constructor. Afterwards, you try changing the page size like this:

doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, image.Width, image.Height, 0));
doc.NewPage();

However: as you didn't add any content to the first page yet, the NewPage() method will be ignored and the page size will not be changed. You will still be on page 1 with size A4.

iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(source);
using (FileStream fs = new FileStream(output, FileMode.Create, FileAccess.Write, FileShare.None))
{
    using (Document doc = new Document(image))
    {
        using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
        {
            doc.Open();
            image.SetAbsolutePosition(0, 0);
            writer.DirectContent.AddImage(image); 
            doc.Close();
         }
     }
}
Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • Ok that makes sense that iText just copies the image and then I understand that the clipping path will be discarded since PDF does not support cliiping paths. There are other software alternatives, such as ImageMagick (+GhostScript) who handles this internally, applying the clipping path before conversion. – Lars Thorén May 29 '15 at 07:47
  • Yes, that's what iText does with transparent PNGs. I'll update my answer. – Bruno Lowagie May 29 '15 at 07:48
  • 1
    @Vlado `output` is a path to a file. – Bruno Lowagie Jan 19 '16 at 14:44