1

I am getting below exception while trying to redact pdf document using itext. The issue is very sporadic like sometime it is working and sometimes it is throwing error.

at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.access$6100(PdfContentStreamProcessor.java:60)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor$Do.invoke(PdfContentStreamProcessor.java:991)
at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpContentOperator.invoke(PdfCleanUpContentOperator.java:140)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(PdfContentStreamProcessor.java:286)
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:425)
at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpProcessor.cleanUpPage(PdfCleanUpProcessor.java:160)
at com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpProcessor.cleanUp(PdfCleanUpProcessor.java:135)
at RedactionClass.tgestRedactJavishsInput(RedactionClass.java:56)
at RedactionClass.main(RedactionClass.java:23)

Code which i am using to redact is below:

public static void testRedact() throws IOException, DocumentException {

    InputStream resource = new FileInputStream("D:/itext/edited_120192824_5 (1).pdf");
    OutputStream result = new FileOutputStream(new File(OUTPUTDIR,
            "aviteshs.pdf"));

    PdfReader reader = new PdfReader(resource);
    PdfStamper stamper = new PdfStamper(reader, result);
    int pageCount = reader.getNumberOfPages();
    Rectangle linkLocation1 = new Rectangle(440f, 700f, 470f, 710f);
    Rectangle linkLocation2 = new Rectangle(308f, 205f, 338f, 215f);
    Rectangle linkLocation3 = new Rectangle(90f, 155f, 130f, 165f);
    List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
    for (int currentPage = 1; currentPage <= pageCount; currentPage++) {
        if (currentPage == 1) {
            cleanUpLocations.add(new PdfCleanUpLocation(currentPage,
                    linkLocation1, BaseColor.BLACK));
            cleanUpLocations.add(new PdfCleanUpLocation(currentPage,
                    linkLocation2, BaseColor.BLACK));
            cleanUpLocations.add(new PdfCleanUpLocation(currentPage,
                    linkLocation3, BaseColor.BLACK));
        } else {
            cleanUpLocations.add(new PdfCleanUpLocation(currentPage,
                    linkLocation1, BaseColor.BLACK));
        }
    }
    PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations,
            stamper);
    try {
        cleaner.cleanUp();
    } catch (Exception e) {
        e.printStackTrace();
    }
    stamper.close();
    reader.close();

}

Due to customer document i am unable to share it , trying to find out some test data for same.

Please find the doc here:

https://drive.google.com/file/d/0B-zalNTEeIOwM1JJVWctcW8ydU0/view?usp=drivesdk

mkl
  • 90,588
  • 15
  • 125
  • 265
DevAvitesh
  • 187
  • 3
  • 13
  • 2
    Please include the full stack trace – avojak Apr 04 '17 at 15:23
  • 2
    And a minimal complete verifiable example and also a question – JeremyP Apr 04 '17 at 15:25
  • 2
    And a PDF file that reproduces the issue. – Amedee Van Gasse Apr 04 '17 at 15:54
  • A known case where something like this can happen is for bitmap images in the PDF the format of which the iText parsing framework does not know. On the other hand redaction needs to understand image formats to apply redaction to the image content. Thus, an exception occurs which is admittedly does not clearly indicate the cause. Whether this is the case for your files or not, can only be determined with the PDF in question and the redaction operation you apply. – mkl Apr 04 '17 at 16:12
  • Actually due to customer document , i am unable to share the document here. but yes it consist of logo on it which is colored and could be bitmap. – DevAvitesh Apr 04 '17 at 17:10
  • is there any way to convert bitmap contained bitmap to normal pdf? So that our code can execute properly – DevAvitesh Apr 04 '17 at 18:25
  • Hard to tell without the document... – mkl Apr 04 '17 at 22:45
  • @mkl please find the document in comment – DevAvitesh Apr 05 '17 at 06:25
  • I'll look into that later this week. Too busy today... – mkl Apr 05 '17 at 08:19
  • Your profile says you work at Syntel, which is a registered client of iText. Perhaps you could check whether you have an active support license. That way you could share the document (with iText even offering to sign an NDA if needed). – Joris Schellekens Apr 06 '17 at 14:34
  • @DevAvitesh Did my answer explain the issue sufficiently? Not at all reacting to the only answer to one's question is disappointing. – mkl Apr 17 '17 at 21:19
  • @mkl it helps a lot – DevAvitesh Apr 17 '17 at 21:20
  • @DevAvitesh In that case it would be nice if you accepted the answer (by clicking on the tick at its upper left). This has multiple effects: First of all, the question is marked as answered, so anyone with a similar question will be shown this question marked as answered and, therefore, may read it first. Furthermore, both you and I will receive stack overflow reputation points.. – mkl Apr 17 '17 at 21:32
  • @DevAvitesh Thanks. – mkl Apr 17 '17 at 21:33

1 Answers1

0

In short: The cause of the NullPointerException here is that iText does not support form XObject resource inheritance from the page they are displayed on. According to the PDF specification this construct is obsolete but it can be encountered in PDFs obeying early PDF references instead of the specification.

The cause

Page 1 of the document in question contains 4 XObject resources named I1, M0, P1, and Q0:

RUPS screenshot

As you can see in the screenshot, Q0 in particular has no own Resources dictionary. But its last instructions are

q
413 0 0 125 75 3086 cm
/I1 Do
Q

Id est it references a resource I1.

Now iText in case of form XObjects assumes that the resources their contents reference are contained in their own Resources dictionary.

The result: iText accesses a null dictionary and a NullPointerException occurs.

The specification

The PDF specification ISO 32000-1 specifies:

A resource dictionary shall be associated with a content stream in one of the following ways:

  • For a content stream that is the value of a page’s Contents entry (or is an element of an array that is the value of that entry), the resource dictionary shall be designated by the page dictionary’s Resources or is inherited, as described under 7.7.3.4, "Inheritance of Page Attributes," from some ancestor node of the page object.

  • For other content streams, a conforming writer shall include a Resources entry in the stream's dictionary specifying the resource dictionary which contains all the resources used by that content stream. This shall apply to content streams that define form XObjects, patterns, Type 3 fonts, and annotation.

  • PDF files written obeying earlier versions of PDF may have omitted the Resources entry in all form XObjects and Type 3 fonts used on a page. All resources that are referenced from those forms and fonts shall be inherited from the resource dictionary of the page on which they are used. This construct is obsolete and should not be used by conforming writers.

(ISO 32000-1, section 7.8.3 - Resource Dictionaries)

Thus, in the case at hand we are in the situation of the obsolete option three, Q0 references the XObject I1 defined in the resource dictionary of the page Q0 is used for.

The document in question has a version header claiming PDF 1.5 conformance (in contrast to PDF 1.7 of the PDF specification). So let's look at the PDF Reference 1.5. The paragraph there corresponding to option three is:

  • A form XObject or a Type 3 font’s glyph description may omit the Resources entry, in which case resources will be looked up in the Resources entry of the page on which the form or font is used. This practice is not recommended.

Summarized, therefore, the PDF in question uses a construct which the PDF specification (published in 2008, in use for nine years!) calls obsolete and even the PDF Reference the file claims conformance to recommends against. iText, on the other hand, does not support this obsolete construct.

Ideas how to fix this

Essentially the PDF Cleanup code must be extended to

  • remember the resources of the current page in the PdfCleanUpProcessor and
  • use these current page resources in the PdfCleanUpContentOperator method invoke in case of a Do operator referring to form XObject without own resources.

Unfortunately some members used in invoke are private. Thus, one has to either copy the PdfCleanUp code or fall back on reflection.

(iText 5.5.12-SNAPSHOT)

iText 7

The iText 7 PDF CleanUp tool also runs into an issue for your PDF, here the exception is a IllegalStateException claiming "Graphics state is always deleted after event dispatching. If you want to preserve it in renderer info, use preserveGraphicsState method after receiving renderer info."

As this exception is thrown during event dispatching, this error message does not make sense. Unfortunately the PDF CleanUp tool has become closed source in iText 7, so it is not so easy pinpointing the issue.

(iText 7.0.3-SNAPSHOT; PDF CleanUp 1.0.2-SNAPSHOT)

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Hi All , I am wondering that do we have any other option or set of open source Java API which can help me to redact PDF as my client's license is getting expired – DevAvitesh Apr 19 '17 at 20:52
  • In that case the license should be renewed. – mkl Apr 20 '17 at 04:09