getting exception while redacting pdf using itext

Question

I am getting below exception while trying to redact pdf document using itext. The issue is very sporadic like sometime it is working and sometimes it is throwing error.

at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.access$6100(PdfContentStreamProcessor.java:60)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor$Do.invoke(PdfContentStreamProcessor.java:991)
at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpContentOperator.invoke(PdfCleanUpContentOperator.java:140)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(PdfContentStreamProcessor.java:286)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:425)
at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpProcessor.cleanUpPage(PdfCleanUpProcessor.java:160)
at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpProcessor.cleanUp(PdfCleanUpProcessor.java:135)
at RedactionClass.tgestRedactJavishsInput(RedactionClass.java:56)
at RedactionClass.main(RedactionClass.java:23)

Code which i am using to redact is below:

public static void testRedact() throws IOException, DocumentException {

    InputStream resource = new FileInputStream("D:/itext/edited_120192824_5 (1).pdf");
    OutputStream result = new FileOutputStream(new File(OUTPUTDIR,
            "aviteshs.pdf"));

    PdfReader reader = new PdfReader(resource);
    PdfStamper stamper = new PdfStamper(reader, result);
    int pageCount = reader.getNumberOfPages();
    Rectangle linkLocation1 = new Rectangle(440f, 700f, 470f, 710f);
    Rectangle linkLocation2 = new Rectangle(308f, 205f, 338f, 215f);
    Rectangle linkLocation3 = new Rectangle(90f, 155f, 130f, 165f);
    List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
    for (int currentPage = 1; currentPage <= pageCount; currentPage++) {
        if (currentPage == 1) {
            cleanUpLocations.add(new PdfCleanUpLocation(currentPage,
                    linkLocation1, BaseColor.BLACK));
            cleanUpLocations.add(new PdfCleanUpLocation(currentPage,
                    linkLocation2, BaseColor.BLACK));
            cleanUpLocations.add(new PdfCleanUpLocation(currentPage,
                    linkLocation3, BaseColor.BLACK));
        } else {
            cleanUpLocations.add(new PdfCleanUpLocation(currentPage,
                    linkLocation1, BaseColor.BLACK));
        }
    }
    PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations,
            stamper);
    try {
        cleaner.cleanUp();
    } catch (Exception e) {
        e.printStackTrace();
    }
    stamper.close();
    reader.close();

}

Due to customer document i am unable to share it , trying to find out some test data for same.

Please find the doc here:

https://drive.google.com/file/d/0B-zalNTEeIOwM1JJVWctcW8ydU0/view?usp=drivesdk

And a minimal complete verifiable example and also a question — JeremyP, Apr 04 '17 at 15:25
A known case where something like this can happen is for bitmap images in the PDF the format of which the iText parsing framework does not know. On the other hand redaction needs to understand image formats to apply redaction to the image content. Thus, an exception occurs which is admittedly does not clearly indicate the cause. Whether this is the case for your files or not, can only be determined with the PDF in question and the redaction operation you apply. — mkl, Apr 04 '17 at 16:12
Actually due to customer document , i am unable to share the document here. but yes it consist of logo on it which is colored and could be bitmap. — DevAvitesh, Apr 04 '17 at 17:10
is there any way to convert bitmap contained bitmap to normal pdf? So that our code can execute properly — DevAvitesh, Apr 04 '17 at 18:25
Your profile says you work at Syntel, which is a registered client of iText. Perhaps you could check whether you have an active support license. That way you could share the document (with iText even offering to sign an NDA if needed). — Joris Schellekens, Apr 06 '17 at 14:34
@DevAvitesh Did my answer explain the issue sufficiently? Not at all reacting to the only answer to one's question is disappointing. — mkl, Apr 17 '17 at 21:19
@DevAvitesh In that case it would be nice if you accepted the answer (by clicking on the tick at its upper left). This has multiple effects: First of all, the question is marked as answered, so anyone with a similar question will be shown this question marked as answered and, therefore, may read it first. Furthermore, both you and I will receive stack overflow reputation points.. — mkl, Apr 17 '17 at 21:32

score 0 · Accepted Answer · answered Apr 06 '17 at 21:40

In short: The cause of the NullPointerException here is that iText does not support form XObject resource inheritance from the page they are displayed on. According to the PDF specification this construct is obsolete but it can be encountered in PDFs obeying early PDF references instead of the specification.

The cause

Page 1 of the document in question contains 4 XObject resources named I1, M0, P1, and Q0:

As you can see in the screenshot, Q0 in particular has no own Resources dictionary. But its last instructions are

q
413 0 0 125 75 3086 cm
/I1 Do
Q

Id est it references a resource I1.

Now iText in case of form XObjects assumes that the resources their contents reference are contained in their own Resources dictionary.

The result: iText accesses a null dictionary and a NullPointerException occurs.

The specification

The PDF specification ISO 32000-1 specifies:

A resource dictionary shall be associated with a content stream in one of the following ways:

For a content stream that is the value of a page’s Contents entry (or is an element of an array that is the value of that entry), the resource dictionary shall be designated by the page dictionary’s Resources or is inherited, as described under 7.7.3.4, "Inheritance of Page Attributes," from some ancestor node of the page object.

For other content streams, a conforming writer shall include a Resources entry in the stream's dictionary specifying the resource dictionary which contains all the resources used by that content stream. This shall apply to content streams that define form XObjects, patterns, Type 3 fonts, and annotation.

PDF files written obeying earlier versions of PDF may have omitted the Resources entry in all form XObjects and Type 3 fonts used on a page. All resources that are referenced from those forms and fonts shall be inherited from the resource dictionary of the page on which they are used. This construct is obsolete and should not be used by conforming writers.

(ISO 32000-1, section 7.8.3 - Resource Dictionaries)

Thus, in the case at hand we are in the situation of the obsolete option three, Q0 references the XObject I1 defined in the resource dictionary of the page Q0 is used for.

The document in question has a version header claiming PDF 1.5 conformance (in contrast to PDF 1.7 of the PDF specification). So let's look at the PDF Reference 1.5. The paragraph there corresponding to option three is:

A form XObject or a Type 3 font’s glyph description may omit the Resources entry, in which case resources will be looked up in the Resources entry of the page on which the form or font is used. This practice is not recommended.

Summarized, therefore, the PDF in question uses a construct which the PDF specification (published in 2008, in use for nine years!) calls obsolete and even the PDF Reference the file claims conformance to recommends against. iText, on the other hand, does not support this obsolete construct.

Ideas how to fix this

Essentially the PDF Cleanup code must be extended to

remember the resources of the current page in the PdfCleanUpProcessor and
use these current page resources in the PdfCleanUpContentOperator method invoke in case of a Do operator referring to form XObject without own resources.

Unfortunately some members used in invoke are private. Thus, one has to either copy the PdfCleanUp code or fall back on reflection.

(iText 5.5.12-SNAPSHOT)

iText 7

The iText 7 PDF CleanUp tool also runs into an issue for your PDF, here the exception is a IllegalStateException claiming "Graphics state is always deleted after event dispatching. If you want to preserve it in renderer info, use preserveGraphicsState method after receiving renderer info."

As this exception is thrown during event dispatching, this error message does not make sense. Unfortunately the PDF CleanUp tool has become closed source in iText 7, so it is not so easy pinpointing the issue.

(iText 7.0.3-SNAPSHOT; PDF CleanUp 1.0.2-SNAPSHOT)

Hi All , I am wondering that do we have any other option or set of open source Java API which can help me to redact PDF as my client's license is getting expired — DevAvitesh, Apr 19 '17 at 20:52

getting exception while redacting pdf using itext

1 Answers1

The cause

The specification

Ideas how to fix this

iText 7