2

I'm trying to save and load a PDF document using Swift's PDFDocument.write method, but encountered an issue where the textual content of the saved document was consistently corrupted compared to the original. For instance, "*" gets changed to "ú" and "ff" is substituted with "\0". I've attached a comparison of the first pages of both documents here.

I've learned that this might be a character encoding issue, but couldn't figure out how this could happen. The PDF document is a paper downloaded directly from arXiv, and the PDFDocument.write method doesn't provide any way to specify encoding.

Is there any solution or workaround to save a PDFDocument object without running into this issue?

Here is the code to reproduce the issue:

import PDFKit


let originalDocument = PDFDocument(url: URL(string: "https://arxiv.org/pdf/1910.10683.pdf")!)!
                
let saveDirectory = FileManager.default.temporaryDirectory.appendingPathComponent("saved.pdf")
originalDocument.write(to: saveDirectory)
                
let savedDocument = PDFDocument(url: saveDirectory)!
                
if originalDocument.string != savedDocument.string {
    print("Textual content has changed!")
}
Nguyễn Khắc Hào
  • 1,980
  • 2
  • 15
  • 25

0 Answers0