Positional errors handling the detected text of SwiftUI's Vision Framework

Question

I am new to SwiftUI, and I am trying to write a feature that takes into an image, detects the text from the image, translates the text through Google's API, and displays the translated text under or to the right of the original text. I use SwiftUI's Vision library to detect the text, which returns both the detected text and the bounding box. Fortunately, every function works and the text is detected and translated. However, when I want to put the translated text label to the right of the original text, the positioning of the labels just messes up. All of them stack up at the top-left corner of the screen... Can anyone help me to change the code so that the translated labels are organized under each of the orginal text? Thanks!

This is the code for extracting the text:

let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
let request = VNRecognizeTextRequest(completionHandler: { (request, error) in
    if let error = error {
        print("OCR error: \(error)")
    } else {
        self.processResults(from: request)
    }
})
request.recognitionLevel = .accurate
request.usesLanguageCorrection = true

do {
    try requestHandler.perform([request])
} catch {
    print("Failed to perform OCR: \(error)")
}

And this:

guard let observations = request.results as? [VNRecognizedTextObservation] else { return }

for (index, observation) in observations.enumerated() {
    guard let topCandidate = observation.topCandidates(1).first else { continue }

    DispatchQueue.main.async {
        self.recognizedText.append((topCandidate.string, observation.boundingBox))
        self.translateText(text: topCandidate.string, at: index)
    }
}

And this:

struct ImagePicker: UIViewControllerRepresentable {
    @Environment(\.presentationMode) var presentationMode
    @Binding var image: UIImage?
    let sourceType: UIImagePickerController.SourceType
    
    func makeUIViewController(context: UIViewControllerRepresentableContext<ImagePicker>) -> UIImagePickerController {
        let picker = UIImagePickerController()
        picker.delegate = context.coordinator
        picker.sourceType = self.sourceType
        return picker
    }
    
    func updateUIViewController(_ uiViewController: UIImagePickerController, context: UIViewControllerRepresentableContext<ImagePicker>) {
    }
    
    func makeCoordinator() -> Coordinator {
        Coordinator(self)
    }
    
    class Coordinator: NSObject, UINavigationControllerDelegate, UIImagePickerControllerDelegate {
        let parent: ImagePicker

        init(_ parent: ImagePicker) {
            self.parent = parent
        }
        
        func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
            if let uiImage = info[.originalImage] as? UIImage {
                parent.image = uiImage
            }
            
            parent.presentationMode.wrappedValue.dismiss()
        }
    }
}

Additionally, this is the code which I used to display the labels:

                GeometryReader { geometry in
                    ZStack {
                        Image(uiImage: image ?? UIImage())
                            .resizable()
                            .scaledToFit()
                        
                        ForEach(self.recognizedText.indices, id: \.self) { index in
                            let box = self.recognizedText[index].1
                            
                            let imageSize = geometry.size
                            let scale = min(imageSize.width / (image?.size.width ?? 1), imageSize.height / (image?.size.height ?? 1))
                            
                            let origin = CGPoint(x: box.origin.x * imageSize.width, y: (1 - box.origin.y - box.height) * imageSize.height)
                            let size = CGSize(width: box.width * imageSize.width, height: box.height * imageSize.height)
                            
                            if let translation = self.translatedText[index] {
                                Text(translation)
                                    .background(Color.yellow)
                                    .position(x: (origin.x + size.width) * scale, y: (origin.y + size.height / 2) * scale)
                            }
                        }
                    }
                }
                .padding()

Positional errors handling the detected text of SwiftUI's Vision Framework

0 Answers0