Running out of memory in a for loop swift (4)

Question

I'm looping through all pages in a PDFDocument (200+ pages) but app crashes with

Message from debugger: Terminated due to memory issue

The pdf is approx 4mb in size yet each iteration of the loop jumps the memory up approx 30mb. Which doesn't seem right to me. I have managed to locate where in my code the memory is being used just not sure how to claim it back. Tried setting variables to nil but no effect. Tried code in the for loop in an autoreleaspool{} but no effect.

@objc func scrapePDF(){

    let documentURL = self.documentDisplayWebView!.url!
    let document = PDFDocument(url: documentURL)
    let numberOfPages = document!.pageCount

    DispatchQueue.global().async {

        for pageNumber in 1...numberOfPages {

           print(document?.page(at: pageNumber)!.string!)

        }
    }
}

UPDATE: solved ..... kind of

Playing around a bit I found that rather than passing a reference to the PDFDocument inside the loop, if instead I create a new instance for each loop this strangely solves the memory issue. I don't quite understand why though. PDFDocument is a Class not a Struct so is passed by reference. Meaning it is only created once and then referenced to inside my loop. So why would it cause a memory issue?

@objc func scrapePDF(){

    let documentURL = self.documentDisplayWebView!.url!
    let document = PDFDocument(url: documentURL)
    let numberOfPages = document!.pageCount

    DispatchQueue.global().async {

        for pageNumber in 1...numberOfPages {
           let doc = PDFDocument(url: documentURL)
           print(doc?.page(at: pageNumber)!.string!)

        }
    }
}

Though the above code clears the memory issue the problem with it is that its too slow. Each loop takes 0.5 seconds and with 300+ pages I can't accept that. Any tips on speeding it up? Or why it doesn't give the memory back if referencing the PDFDocument from outside the loop

Further UPDATE. It seems that it’s calling the .string method of the PDFPage that is increases the memory to the point of crashing.

Hi - this question refers to creating PDF's and not reading them. however, the solution may be relevant https://stackoverflow.com/questions/14699194/memory-warning-and-crash-when-creating-pdf <- It refers to running one page at a time. — benjiiiii, Dec 08 '17 at 12:46
Have you tried fetching the first 20 pages to see if the memory is released when the loop completes? — Laffen, Dec 08 '17 at 14:19
As per Apple docs https://developer.apple.com/documentation/pdfkit/pdfdocument/1436036-string String >`This is a convenience method, equivalent to creating a selection object for the entire document and then invoking the PDFSelection class’s string method.` Looks like it will create String representation of entire document and using PDFSelection convince init it will get that one page so memory will is affecting here — Prashant Tukadiya, Dec 08 '17 at 14:25
@Laffen yeah if I do that I get significant portions of the memory back. I could write so logic to read n number of pages at a time but I’d rather not if avoidable — RyanTCB, Dec 08 '17 at 14:30
@PrashantTukadiya but that’s if I call string in the PDFDocument. When calling on the page it should return just the text of that page. https://developer.apple.com/documentation/pdfkit/pdfpage/1503949-string — RyanTCB, Dec 08 '17 at 14:37
@RyanTCB oh I missed that , Did you tried with different PDF's ? — Prashant Tukadiya, Dec 08 '17 at 14:41
@PrashantTukadiya I’ve tried numerous PDFs and as long as the document is less that 200 pages it can complete the loop and return the memory. However I can’t be sure of the size of the PDF so need a solution to reclaim memory. My option so far is to follow Laffen suggestion and fetch parts at a time. I’d rather understand why swift keeps it’s all I’m memory — RyanTCB, Dec 08 '17 at 14:45
what happens if you remove the line `DispatchQueue.global().async {` — meggar, Dec 08 '17 at 15:12
@meggar it freezes the UI. Didn’t think it was good practice to do that — RyanTCB, Dec 08 '17 at 15:13
It almost looks like that the `PDFDocument` caches the fetched pages, resulting in a memory warning when the cache gets to big. This explains why it works instantiating a new `PDFDocument` in every loop, because then only one page is cached at any given time. I'm curious on why you're scraping the PDF in the first place? — Laffen, Dec 11 '17 at 08:15
Im scraping so I can enter details into a calendar rather than having user enter manually. I also conclude that its caching the fetched pages but why? Why is it not just using the instance passed in. If `PDFDocument` was a Struct id get that it sends copy but its a Class so by reference. — RyanTCB, Dec 11 '17 at 08:19

Running out of memory in a for loop swift (4)

0 Answers0