0

I have a PDF that is over 6,000 pages long. I would like to split it into separate pdfs that are each 50 pages long (or any other length I choose), and save it to an output folder. I wrote the following code, but it is extremely slow, and took an hour to get through the first 1000 pages. Also, it doesn't split the last few pages that aren't divisible by 50 (although I guess I could just split those manually.) What's the best way to make this faster?

library(pdftools)

filepath = "./data/really_long_pdf.pdf" 

total_pages = pdf_info(filepath)$pages
pages_per_output = 50
output_folder = "./data/output"

for (i in seq(1, total_pages, pages_per_output)) {
  start_page = i
  print(start_page)
  end_page = min(i + pages_per_output - 1, total_pages)
  
  output_file = paste0(output_folder, "/output_pages_", start_page, "_", end_page, ".pdf")
  
  pdf_subset(filepath, output_file, pages = start_page:end_page)
}
user3710004
  • 511
  • 1
  • 6
  • 15
  • 3
    I'd think using [`pdftk`](https://www.pdflabs.com/tools/pdftk-server/) (command-line or gui, neither "R"-based) would be much faster than all of that. R can do something like this, as can python, but since your need has already been generalized and implemented in a bespoke compiled utility, I don't know that you'll find much (if any) that is faster and/or more efficient. – r2evans Aug 16 '23 at 19:05
  • 2
    You're right. I ended up using QPDF, took less than a minute and all I needed was one line of code: qpdf --split-pages=50 readly-long-pdf.pdf output-pages.pdf – user3710004 Aug 16 '23 at 19:25
  • 3
    I like R: it can do many things, but so many of those "many things" have no business being done in R ;-) – r2evans Aug 16 '23 at 21:07

0 Answers0