3

I am trying to create PDF/A file using PDFBOX and file genearation is done successfully but generated file is very large in size... Some times 500 MBs or even more. Is there any way to decrease file size while generation ?

Nishith Patel
  • 43
  • 1
  • 4
  • Depends on how you create those large files. 500MB sounds quite large, so your code can very likely be improved. As you don't show it, though, it's hard to describe how. – mkl Jan 22 '18 at 08:45
  • Possible ideas: you created the same PDFont several times, and embedded in full instead of creating it one time and subsetting. Or created the same PDXObjectImage (e..g company logo) several times instead of using it again. – Tilman Hausherr Jan 22 '18 at 09:56
  • 1
    Thanks @TilmanHausherr i have applied changes you have suggested of remove multiple creation of PDFont and by doing so file size is drastically decreased....I mean from 200 MBs to 2 MB for same data. – Nishith Patel Jan 23 '18 at 09:28

1 Answers1

5

As discussed in the comments: PDFont objects of a specific font should be constructed only once, and it can be reused in different pages of one PDF.

Fonts should be subsetted (i.e. that only the used glyphs are embedded), for that use PDType0Font.load().

The same applies to PDXObjectImage objects, e.g. for a company logo: the PDXObjectImage should be created once and be reused in different pages of one PDF.

PD objects shouldn't be used in different PDFs.

TrueTypeFont font objects can be reused in several documents:

TrueTypeFont ttf = new TTFParser().parse(file);
PDFont font1 = PDType0Font.load(document1, ttf, true); // last parameter should be false if used for acroForm fields
PDFont font2 = PDType0Font.load(document2, ttf, true);
PDFont font3 = PDType0Font.load(document3, ttf, true);
Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
  • Could you provide replacement for PDType0Font.load(document,fontFile, false); so it will take minimum file size. – Yogesh Bombe Feb 12 '19 at 13:22
  • @YogeshBombe Set the last parameter to true or remove it. However subsetting should not be done for fonts in an acroForm field. – Tilman Hausherr Feb 12 '19 at 13:38
  • I want to use PDFont as a static object so i can use it again the same. but when i set embed flag to true in web context It works fine first time but second time it gives font related error.that's why i have set it false but it creates heavy file. – Yogesh Bombe Feb 12 '19 at 13:51
  • Getting error The TrueType font null does not contain a 'cmap' table when iset it to true – Yogesh Bombe Feb 12 '19 at 14:17
  • @YogeshBombe The font is bound to the PDDocument. When saving the subsetting is made. You can't reuse it with another PDDocument. What you could do is to first create a TrueTypeFont, and then use that one. The TrueTypeFont can be static. Create it with `TrueTypeFont ttf = new TTFParser().parse(file)`. – Tilman Hausherr Feb 12 '19 at 15:19
  • Can we cast TrueTypeFont to PdFont .because contentStream.setFont(font, fontSize); method not allowing TrueTypeFont . – Yogesh Bombe Feb 14 '19 at 06:39
  • No you pass the `TrueTypeFont` to `PDType0Font.load(document, ttf, true)`. – Tilman Hausherr Feb 14 '19 at 08:05
  • Could you suggest what we prefer in context of performance (cost of object creation). if ttf is static and fontFile is also static 1)PDType0Font.load(document, ttf, true); or 2) PDType0Font.load(document, fontFile, true); – Yogesh Bombe Feb 14 '19 at 09:01
  • fontFile is just a pointer to a file, so it's almost nothing. A static TrueTypeFont is bigger. Reusing a TrueTypeFont object is mostly useful for speed. – Tilman Hausherr Feb 14 '19 at 09:05