I'm trying to save and load a PDF document using Swift's PDFDocument.write
method, but encountered an issue where the textual content of the saved document was consistently corrupted compared to the original. For instance, "*" gets changed to "ú" and "ff" is substituted with "\0". I've attached a comparison of the first pages of both documents here.
I've learned that this might be a character encoding issue, but couldn't figure out how this could happen. The PDF document is a paper downloaded directly from arXiv, and the PDFDocument.write
method doesn't provide any way to specify encoding.
Is there any solution or workaround to save a PDFDocument
object without running into this issue?
Here is the code to reproduce the issue:
import PDFKit
let originalDocument = PDFDocument(url: URL(string: "https://arxiv.org/pdf/1910.10683.pdf")!)!
let saveDirectory = FileManager.default.temporaryDirectory.appendingPathComponent("saved.pdf")
originalDocument.write(to: saveDirectory)
let savedDocument = PDFDocument(url: saveDirectory)!
if originalDocument.string != savedDocument.string {
print("Textual content has changed!")
}