0

I need to do text recognition on hundreds of images one at a time, but each time, memory grows by ~25mb+. I've searched the internet but can't find what is causing the memory retention or how to release it. By putting in breaks, I can see that the jump in size occurs with each call to imageRequestHandler.perform(). Below is the relevant code. Any suggestions?

    func textRecognition(image:CGImage) {
                
        let textRecognitionRequest = VNRecognizeTextRequest(
            completionHandler: self.handleDetectedText)
        textRecognitionRequest.recognitionLevel = .accurate
        textRecognitionRequest.recognitionLanguages = ["en_US"]
        textRecognitionRequest.usesLanguageCorrection = false
        
        // request handler
        let textRequest = [textRecognitionRequest]
        let imageRequestHandler = VNImageRequestHandler(cgImage: image, orientation: .up, options: [:])
        DispatchQueue.global(qos: .userInitiated).async {
            do {
                // perform request
                try imageRequestHandler.perform(textRequest)
            } catch let error {
                print("Error \(error)")
            }
        }
    }
    
    func handleDetectedText(request: VNRequest?, error:Error?){
        if let error = error {  print("ERROR: \(error)"); return  }
        guard let results = request?.results, results.count > 0 else {
            DispatchQueue.main.async {
                self.result_field.isEnabled=false
                self.result_field.text = "Scan failed - Retry"
                let desc = NSMutableAttributedString(string: "Retake Photo", attributes: [NSAttributedString.Key.font: UIFont.systemFont(ofSize: 22, weight: .regular)])
                self.take_button.setAttributedTitle(desc, for: UIControl.State.normal)
                self.take_button.isHidden = false
                self.take_button.isEnabled = true
            }
            return // code to process the text replaced by the 'return' statement
        }}}
timbre timbre
  • 12,648
  • 10
  • 46
  • 77
Dave
  • 71
  • 5

1 Answers1

1

I would recommend profiling to see what takes the memory. However in general I see 2 areas that could be explored for potential improvements:

  1. Image may be causing higher memory usage. Our (developers) tendency is to capture "the best" image. However text recognition may not need a huge high quality image (and give the same results for smaller image). So try to make your image smaller on capturing and/or processing:

    • experiment with AVCapturePhotoSettings, namely photoQualityPrioritization and photo format,
    • try to reduce CGImage size / maybe make it B&W. Those operations are not without the cost as well though. Or try to pass CVPixelBuffer directly to VNImageRequestHandler (without converting to CGImage - that's if you are taking image from camera)
  2. See if autoreleasepool benefits your memory usage. The most obvious location is imageRequestHandler.perform: try to perform it on a dedicated queue, and set autoreleaseFrequency: .workItem for that queue:

private static let performQueue = DispatchQueue(
        label: "your label",
        qos: .userInitiated,
        autoreleaseFrequency: .workItem, // <-- switching from default .never
        target: .global(qos: .userInitiated) <-- same priority as the one you have
    )

// ...

Self.performQueue.async {
    try imageRequestHandler.perform(textRequest)
}

What does it do:

The queue configures an autorelease pool before the execution of a block, and releases the objects in that pool after the block finishes executing.

Since the recognition request may have a lot of temporary objects, you may benefit from this setting, since the objects will be immediately released after the request completes (instead of "eventually"). There's no guarantee that it will help, but worth a try.

Again, those are suggestions, but really performance evaluation is needed if reducing memory is that important for you. Although I think 25MB is quite reasonable, to be honest.

timbre timbre
  • 12,648
  • 10
  • 46
  • 77
  • Thanks for the explicit response! I tried suggestion #2 first because that's in line with my own thinking, but it had no effect -- still growing memory 25mb+ every iteration -- and I want to do 100's of iterations. As for #1, I already crop the source to a 400x600px image prior to text recognition. Any other ideas? – Dave Mar 20 '23 at 17:50
  • Also, I looked at Instruments and (if I'm reading it correctly), CMPhoto called by CMPhotoCreatImageSurface, is adding 17.45MiB per iteration. – Dave Mar 20 '23 at 18:08
  • @Dave did you try using `CVPixelBuffer` directly instead? Also `CMPhoto` is a private framework - not much you can do with it directly, but see what calls it, and see if you can wrap whatever is calling it in `autoreleasepool` block. – timbre timbre Mar 20 '23 at 21:36
  • Oh, and you are not converting CIImage / CGImage back and forth by any chance? cos that can also cost you memory and CPU... – timbre timbre Mar 20 '23 at 21:37
  • I'm using imagePickerController() and I don't know how to get a pixel buffer from the camera, so I didn't go down that path. I get a UIImage from imagePickerController() and convert that to CGImage for cropping and then back to a UIImage . – Dave Mar 20 '23 at 23:04
  • By eliminating the cropping of the original image and submitting the original image for text recognition, I circumvented the memory issue, so it appears that analyzing the cropped image is the problem. The conversion must be causing something to not release. Thanks for your help! – Dave Mar 20 '23 at 23:19
  • Ok, found the real problem: I was adding an extra sublayer to the camera view that was apparently not being released. Thanks again for setting me down the right road. – Dave Mar 21 '23 at 13:52