0

The DeletePage function found in the Foxit SDK allows a page to be deleted from a PDF. When a page is deleted and the document saved, the file size of the output document (with fewer pages) is larger than the original.

This can be seen with the PDF sample app which ships with the SDK:

$ ./simple_sample/bin/rel_gcc/pdfpage_organization 
...
Success: Use <rb> mode to open the file InsertPages.pdf and associate it with a stream.
Success: Create the file object.
Success: Open file successfully .
Success: Load PDF document successfully.
Success: Get page with index 1.
Success: Delete page with index 1.
Using extension implementation to open the file.
Success: Use <wb> mode to open the file SourcePage_delete.pdf and associate it with a stream.
Success: Create the file object.
Success: The PDF document save to SourcePage_delete.pdf.

Examining the output:

ll simple_sample/input_files/SourcePage.pdf -rw-r--r--@ 1 richard staff 92K Dec 17 2013 simple_sample/input_files/SourcePage.pdf

ll simple_sample/output_files/pdfpage_organization/SourcePage_delete.pdf -rw-r--r--@ 1 richard staff 96K Jun 23 10:22 simple_sample/output_files/pdfpage_organization/SourcePage_delete.pdf

SourcePage_delete.pdf does have one less page as expected, but is 4k bigger. I can get the same result deleting 99 pages from a 100 page document, i.e. the file size does not reflect the page count.

user756079
  • 301
  • 2
  • 13
  • I'd be curious to see what the document's internals look like. Can you post before and after files somewhere? – Chris Haas Jun 23 '14 at 18:12
  • Yes, here they are: https://www.dropbox.com/sh/9nx8j6v7421cg4f/AABpI1cbZoNmE4atTH8Hdb7sa – user756079 Jun 23 '14 at 18:29
  • It is possible (I do not know the Foxit SDK) that the delete and save operation are performed in incremental update mode. In this situation the page is only marked as deleted and additional structures are added so the final size is larger than the original. – Mihai Iancu Jun 23 '14 at 19:03
  • I can imagine that being the case. Is there another mode which would force the removal of the structures contained in the page? – user756079 Jun 23 '14 at 19:08
  • There could be a few reasons for this. Perhaps the original PDF was missing some required element and in the process of saving the modified PDF this missing element has been added to the PDF automatically "repairing" it. Alternatively some object may have been compressed prior to page deletion and during/after page deletion the object is no longer compressed (though file size increase in this scenario would most likely be more than 4k). – Rowan Nov 22 '16 at 00:33

2 Answers2

1

Based on the sample documents that you've provided and @MihaiIancu's comment you are saving an incremental PDF update which basically just appends new information to the end of your existing file.

According to the Foxit SDK site the function FSPDF_Doc_StartSaveToFile takes a flag for the third parameter which is FSPDF_SAVEFLAG_INCREMENTAL, FSPDF_SAVEFLAG_NOORIGINAL, FSPDF_SAVEFLAG_REMOVESECURITY or FSPDF_SAVEFLAG_OBJECTSTREAM. In your case I would think that FSPDF_SAVEFLAG_NOORIGINAL should do what you're looking for. If you're not using this function directly there should still hopefully be a wrapper that takes one of these parameters.

Chris Haas
  • 53,986
  • 12
  • 141
  • 274
  • I suspect this is correct (I have no way of proving it). I will accept this as an answer. – user756079 Jun 23 '14 at 21:50
  • Can you look into this some more? I've never used this product before so I'm only making an informed guess. If this isn't correct or you found a better solution the community would be better off if they knew . – Chris Haas Jun 23 '14 at 23:24
  • OK. I just did some fairly extensive testing, and it appears that passing FSPDF_SAVEFLAG_NOORIGINAL does not solve the problem. – user756079 Jun 24 '14 at 00:55
0

In the latest Foxit PDF SDK 6.4 for Linux the save method supports e_SaveFlagRemoveRedundantObjects. Usually when you delete a page, it is just referenced in the PDF structure. If you set this flag, it will just remove any object that is not reference to in the PDF.

Huy Tran
  • 1
  • 3