-1

I need to extract the title of each PDF and a specific content and its pages. For example i have a folder full of PDF's and i need to find in the Table of Contents a heading called Enhancements if there. If Enhancement content is there copy the Title of the PDF usually on first page and copy the Enhancement section and place in another PDF as chronology of enhancements.

Gary Seven
  • 109
  • 1
  • 1
  • 7

1 Answers1

0

You will need to extract text chunks with their coordinates from those PDFs first. You can use a PDF processing software of your choice for this.

Then you will need to analyze extracted chunks and detect what chunks go into the Enhancement section. This is the hardest part. And I doubt there is a software that might do such analysis for you out of the box. Sorry.

Please note that text in PDFs is usually stored in chunks, not words or sentences. Each chunk is one or more characters. It might be one letter or one and the half word. There are no guarantees for what constitutes a chunk.

Bobrovsky
  • 13,789
  • 19
  • 80
  • 130