0

I am trying to split PDF file (book) to multiple files by child bookmarks in code

Use case: table of contents of a book is available for a user. User can select up to n sections (might be not sequential) to preview. Application need to extract this sections and merge into single preview PDF

I found few tools, while looking into the solution in internet: Aspose, Spire (E-IceBlue), etc. All of them can split PDF by pages (top bookmarks), but I need to split PDF by child bookmarks. It means, that area to extract can be started and/or finished at the middle of the page.

Ideally to have abiliti to do this in java code, but if someone knows solution in any other programming language or CLI program - it also would be great

  • The requirement seems quite complex and we are afraid that Aspose.PDF does not offer any direct method to achieve it. However, we request you please create a post in our free support forum (https://forum.aspose.com/c/pdf) along with some more details like sample input/output files. We will try to assist you accordingly there. This is Asad Ali and I am Developer Evangelist at Aspose. – Asad Ali Dec 05 '22 at 22:29

1 Answers1

0

It depends whether you insist that the non-chosen content on a page be redacted or not. For example, if section 6.3.2 takes up the middle half of a page, do you care if the end of 6.3.1 and the beginning of 6.3.3 are shown in the output on the same page?

If you don't care, cpdf can do this easily. Just output the bookmark data as JSON:

cpdf -list-bookmarks-json -utf8 in.pdf > marks.json

Then you can parse this JSON to show the list of bookmarks, and choose which pages to extract based on child bookmark page numbers.

As for redaction, you could use -add-rectangle or -hard-box to clean up the output based on the coordinates from the JSON bookmarks file, but that's not real redaction -- it just removes the content from view.

johnwhitington
  • 2,308
  • 1
  • 16
  • 18