I wrote a small PDF merging application in Swift for MacOS, using Apple's basic PDF framework. The basic strategy is to take a list of PDF files, and then iterate over the pages of the 2nd to nth file, appending each of them to the end of the first file. Core functionality is the following code:
func openPDF(_ file: URL) throws -> PDFDocument {
guard let pdata = try? NSData(contentsOf: file) as Data else {
throw PDFMergeError.cannotOpenFile(filename: file.path)
}
guard let pdf = PDFDocument(data: pdata) else {
throw PDFMergeError.fileNotValidPDF(filename: file.path)
}
return pdf
}
public func mergePDFs(files: [URL]) throws -> PDFDocument {
if files.count == 1 {
throw PDFMergeError.justOneInputFile
}
let first = files[0]
let rest = files[1...]
let pdf = try openPDF(first)
var curpagenum = pdf.pageCount
var cur2add: PDFDocument
var curpage: PDFPage
var lenOfCurAdd: Int
for p2add in rest {
cur2add = try openPDF(p2add)
lenOfCurAdd = cur2add.pageCount
for i in 0..<lenOfCurAdd {
curpage = cur2add.page(at: i)!
pdf.insert(curpage, at: curpagenum)
curpagenum+=1
}
}
return pdf
}
This mostly works fine, and I use it myself fairly regularly (every once in a while, I get a mysterious crash that I haven't bothered to figure out how to fix yet, but I'm not asking about that today).
But sometimes it seems to hyper-inflate the file size of the resulting merged file. For example, this morning I used it to merge the chapters of an e-book that I'd downloaded. The total size of the individual chapters was ~165mb. After using my application to merge it, however, I ended up with a file over 500mb (!!!).
When I tried one of the numerous PDF merging websites floating around out there, I ended up with a much more reasonable sub-200mb file.
So, can someone help me figure out what I'm doing wrong?
UPDATE
After a bit more work, I'm coming to think that the problem here is internal to Apple. In particular, I made a different merged PDF using the same PDF merger service as before (if anyone's curious, it's smallpdf.com). It came out as just under 40MB. Then I edited that pdf in the current version of Preview (built into MacOS 10.15.6), deleting a few pages and resaved. The resaved file, from which I'd only deleted data, came up as almost 80MB.
So, for whatever reason, however Apple handles writing PDFs seems to double the file size from what they might otherwise be, both in my code and in Apple's own.