Split a PDF file that contains several scanned documents

Asked Apr 05 '23 at 09:52

Active Apr 05 '23 at 10:03

Viewed 21 times

I have a big pdf file with 100 pages that contains several scanned documents concatenated, I would like to split this big pdf file into smaller ones, each pdf file must contain a document.

Is there a way to detect the start and the end of a document within this big pdf and make the split with R automatically ?

I have imported the pdf file with pdftools::pdf_text, so it shows me the 100 pages but I have no idea how to know when a document starts and ends within this big pdf other than manually.

edited Apr 05 '23 at 10:03

zx8754

52,746
12
114
209

asked Apr 05 '23 at 09:52

raph

1

it sounds like you will need to do some form of document scraping. does each document have page numbers or titles? Perhaps you can use a document scraper to look for these sections in the over all PDF to help you seperate – Spooked Apr 05 '23 at 13:02

Split a PDF file that contains several scanned documents

0 Answers0