Itextpdf stop transform pdf correctly

Question

I have a next issue with itextpdf.

private void generatePdf() throws Exception {
    FileOutputStream fos = null;
    try {
        PdfReader reader = new PdfReader("template.pdf");
        fos = new FileOutputStream("test.pdf");
        PdfStamper stamper = new PdfStamper(reader, fos);

        stamper.close();
    } catch (Exception e) {
        throw e;
    } finally {
        if (fos != null) {
            try {
                fos.close();
            } catch (IOException e) {
                throw new Exception(e);
            }
        }
    }
}

This method have to read a template and save that to a new pdf. But if I looked into a result pdf I just see blank pages (4 - the same amount as a template has). What interesting that this method is invoked in a context of web app on jboss server. But when I invoke this method like main method in simple java application (Class with main() method) it works fine. Also what can I add that the template has editable fields that have to be filled in future but nothing edits now. Can anybody assume what can be wrong here?

Best Regards, Sergey

Pdf files were uploaded to google driver: template - https://drive.google.com/file/d/0B3-DPMN-iMOmNjItRVJ4MHRZX3M/view?usp=sharing result - https://drive.google.com/file/d/0B3-DPMN-iMOmSDJyWUFRYzFoN3c/view?usp=sharing — Dolzhenok, Mar 06 '15 at 13:10
Ok, first of all I ran your program with your sample input using a current iText version (your 5.0.6 is ancient, I used the current 5.5.6 development snapshot) and the output looks as one would expect. You should consider updating. I'll look at your output to see what is broken in it. — mkl, Mar 06 '15 at 13:39
I've updated lib version to 5.5.5 but the result the same. Link to a result file - https://drive.google.com/file/d/0B3-DPMN-iMOmdlZhMWpJejNwM3c/view?usp=sharing — Dolzhenok, Mar 06 '15 at 14:04
In both cases characters with codes beyond 128 are broken. It looks like the file was somewhere treated as text, loaded as if Latin1 encoded, and stored using UTF-8, which is a reason for the explosion in size. The cross reference entries seem correct, though. This makes me assume that already the template is supplied to the code in a defect manner. How did you supply the template PDF onto the server? Can you check whether that template is not already broken there? And do you use the identical code there, especially do you read the template using `new PdfReader("template.pdf")` there, too? — mkl, Mar 06 '15 at 14:32
So, template.pdf lies in the project folder. Project is assembled by maven and then it's run under Jboss on my local machine. And I don't supply it to the server. There I use `new PdfReader(W9_FORM_PDF_TEMPLATE_PATH);` where constant is `"pdf/fw9_template.pdf"`. But when I run this code as main method in class (just Run class in Idea) output file is correct. — Dolzhenok, Mar 06 '15 at 14:52
*assembled by maven* - you don't by chance filter resources? Maven resource filtering treats files as text files... *"pdf/fw9_template.pdf"* - Have you checked whether that file in the web application folder or *.WAR is unchanged. — mkl, Mar 06 '15 at 15:01
You might consider doing [this](http://maven.apache.org/plugins/maven-resources-plugin/examples/binaries-filtering.html). — mkl, Mar 06 '15 at 15:05
Thanks you a lot! That was what I need: tell to maven not to filter pdf. You saved my friday :) — Dolzhenok, Mar 06 '15 at 15:17

score 1 · Accepted Answer · answered Mar 06 '15 at 15:36

The cause

In comments it turned out that the OP creates his web application in maven, that the template.pdf file is supplied as a maven resource, and that filtering (i.e. text variable replacements) of the resources is activated.

Unfortunately, though, filtering resources implies that the resource files are treated as text files eventually stored using UTF-8 character encoding.

This essentially destroyed all compressed stream contents (especially page contents and font programs) and some meta information strings, and also rendered the cross references incorrect (writing as UTF-8 introduced additional bytes which shifted offsets).

iText could still read the PDF after creating a cross reference table for the mangled file because outside those streams and strings the structure was still correct. The result of writing the read mangled PDF, therefore, contained the right number of pages and some form fields, but the page contents were lost.

The cure

The solution is to not filter PDF resources. This can e.g. be done as explained here on the Apache Maven site:

By default, files with extensions (jpg, jpeg, gif, bmp and png) won't be filtered anymore.

Users can add some extra file extensions to not apply filtering with the following configuration :

<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-resources-plugin</artifactId>
        <version>2.7</version>
        <configuration>
          ...
          <nonFilteredFileExtensions>
            <nonFilteredFileExtension>pdf</nonFilteredFileExtension>
            <nonFilteredFileExtension>swf</nonFilteredFileExtension>
          </nonFilteredFileExtensions>
          ...
        </configuration>
      </plugin>
    </plugins>
    ...
  </build>
  ...
</project>

Itextpdf stop transform pdf correctly

1 Answers1

The cause

The cure