Why is extracting texts from pdf with apple PDFKit not working for some pdfs?

Asked Aug 16 '23 at 10:09

Active Aug 16 '23 at 10:53

Viewed 34 times

-2

I am using apple PDFKit to extract texts from PDF in a SwiftUI project. It is working well for most of the pdfs. But for some pdfs it is not returning any texts. I am getting empty string. I am using following codes:

if let pdf = PDFDocument(url: yourDocumentURL) {
    let pageCount = pdf.pageCount
    let documentContent = NSMutableAttributedString()

    for i in 0 ..< pageCount {
        guard let page = pdf.page(at: i) else { continue }
        guard let pageContent = page.attributedString else { continue }
        documentContent.append(pageContent)
    }
}

I want to know what are the limitations of PDFKit. How can I extract texts from all the pdfs?

edited Aug 16 '23 at 10:53

Dávid Pásztor

51,403
9
85
116

asked Aug 16 '23 at 10:09

Tanvirgeek

1

Without seeing some actual PDFs that don't work, it's impossible to tell for sure what the issue is. Are those actual text PDFs or is the content just images? – Dávid Pásztor Aug 16 '23 at 10:53
Actual texts pdfs. I used online pdf to text converter. They were working. – Tanvirgeek Aug 16 '23 at 11:03
Can you manually select text on the pdf which do not work ? Some pdf seems to be text pdf but are scan of text pages and contain only images. – Ptit Xav Aug 16 '23 at 12:07

Why is extracting texts from pdf with apple PDFKit not working for some pdfs?

0 Answers0