4

I am currently trying to achieve to draw boxes of the text that was recognized with Firebase ML Kit on top of the image. Currently, I did not have success yet and I can't see any box at all as they are all shown offscreen. I was looking at this article for a reference: https://medium.com/swlh/how-to-draw-bounding-boxes-with-swiftui-d93d1414eb00 and also at that project: https://github.com/firebase/quickstart-ios/blob/master/mlvision/MLVisionExample/ViewController.swift

This is the view where the boxes should be shown:

struct ImageScanned: View {
var image: UIImage
@Binding var rectangles: [CGRect]
@State var viewSize: CGSize = .zero

var body: some View {
    // TODO: fix scaling
       ZStack {
           Image(uiImage: image)
               .resizable()
               .scaledToFit()
               .overlay(
                   GeometryReader { geometry in
                    ZStack {
                        ForEach(self.transformRectangles(geometry: geometry)) { rect in
                            Rectangle()
                            .path(in: CGRect(
                                x: rect.x,
                                y: rect.y,
                                width: rect.width,
                                height: rect.height))
                            .stroke(Color.red, lineWidth: 2.0)
                        }
                    }
                }
           )
       }
}
private func transformRectangles(geometry: GeometryProxy) -> [DetectedRectangle] {
    var rectangles: [DetectedRectangle] = []

    let imageViewWidth = geometry.frame(in: .global).size.width
    let imageViewHeight = geometry.frame(in: .global).size.height
    let imageWidth = image.size.width
    let imageHeight = image.size.height

    let imageViewAspectRatio = imageViewWidth / imageViewHeight
    let imageAspectRatio = imageWidth / imageHeight
    let scale = (imageViewAspectRatio > imageAspectRatio)
      ? imageViewHeight / imageHeight : imageViewWidth / imageWidth

    let scaledImageWidth = imageWidth * scale
    let scaledImageHeight = imageHeight * scale
    let xValue = (imageViewWidth - scaledImageWidth) / CGFloat(2.0)
    let yValue = (imageViewHeight - scaledImageHeight) / CGFloat(2.0)

    var transform = CGAffineTransform.identity.translatedBy(x: xValue, y: yValue)
    transform = transform.scaledBy(x: scale, y: scale)

    for rect in self.rectangles {
        let rectangle = rect.applying(transform)
        rectangles.append(DetectedRectangle(width: rectangle.width, height: rectangle.height, x: rectangle.minX, y: rectangle.minY))
    }
    return rectangles
}

}

struct DetectedRectangle: Identifiable {
    var id = UUID()
    var width: CGFloat = 0
    var height: CGFloat = 0
    var x: CGFloat = 0
    var y: CGFloat = 0
}

This is the view where this view is nested in:

struct StartScanView: View {
@State var showCaptureImageView: Bool = false
@State var image: UIImage? = nil
@State var rectangles: [CGRect] = []

var body: some View {
    ZStack {
        if showCaptureImageView {
            CaptureImageView(isShown: $showCaptureImageView, image: $image)
        } else {
            VStack {

                Button(action: {
                    self.showCaptureImageView.toggle()
                }) {
                    Text("Start Scanning")
                }

                // show here View with rectangles on top of image
                if self.image != nil {
                    ImageScanned(image: self.image ?? UIImage(), rectangles: $rectangles)
                }


                Button(action: {
                    self.processImage()
                }) {
                    Text("Process Image")
                }
            }
        }
    }
}

func processImage() {
    let scaledImageProcessor = ScaledElementProcessor()
    if image != nil {
        scaledImageProcessor.process(in: image!) { text in
            for block in text.blocks {
                for line in block.lines {
                    for element in line.elements {
                        self.rectangles.append(element.frame)
                    }
                }
            }
        }
    }
}

}

The calculation of the tutorial caused the rectangles being to big and the one of the sample project them being too small. (Similar for height) Unfortunately I can't find on which size Firebase determines the element's size. This is how it looks like: enter image description here Without calculating the width & height at all, the rectangles seem to have about the size they are supposed to have (not exactly), so this gives me the assumption, that ML Kit's size calculation is not done in proportion to the image.size.height/width.

Mystic Muffin
  • 201
  • 2
  • 14

2 Answers2

7

This is how i changed the foreach loop

Image(uiImage: uiimage!).resizable().scaledToFit().overlay(
                     GeometryReader{ (geometry: GeometryProxy) in
                        ForEach(self.blocks , id: \.self){ (block:VisionTextBlock) in
                            Rectangle().path(in: block.frame.applying(self.transformMatrix(geometry: geometry, image: self.uiimage!))).stroke(Color.purple, lineWidth: 2.0)
                        }
                    }

            )

Instead of passing the x, y, width and height, I am passing the return value from transformMatrix function to the path function.

My transformMatrix function is

    private func transformMatrix(geometry:GeometryProxy, image:UIImage) -> CGAffineTransform {

      let imageViewWidth = geometry.size.width
      let imageViewHeight = geometry.size.height
      let imageWidth = image.size.width
      let imageHeight = image.size.height

      let imageViewAspectRatio = imageViewWidth / imageViewHeight
      let imageAspectRatio = imageWidth / imageHeight
      let scale = (imageViewAspectRatio > imageAspectRatio) ?
        imageViewHeight / imageHeight :
        imageViewWidth / imageWidth

      // Image view's `contentMode` is `scaleAspectFit`, which scales the image to fit the size of the
      // image view by maintaining the aspect ratio. Multiple by `scale` to get image's original size.
      let scaledImageWidth = imageWidth * scale
      let scaledImageHeight = imageHeight * scale
      let xValue = (imageViewWidth - scaledImageWidth) / CGFloat(2.0)
      let yValue = (imageViewHeight - scaledImageHeight) / CGFloat(2.0)

      var transform = CGAffineTransform.identity.translatedBy(x: xValue, y: yValue)
      transform = transform.scaledBy(x: scale, y: scale)
      return transform
    }
}

and the output is

screenshot of working code in emulator

Arun
  • 136
  • 6
0

ML Kit has a QuickStart app showing exactly what you are trying to do: recognizing the text and drawing a rectangle around the text. Here is the Swift code:

https://github.com/firebase/quickstart-ios/tree/master/mlvision/MLVisionExample

Dong Chen
  • 829
  • 4
  • 7
  • If I try their approach, I become the same result as I got with the modified version of mine - the rectangles become way to small. (About 1/6 of their supposed size). Also, they are using UIKit and not SwiftUI... – Mystic Muffin Feb 20 '20 at 21:13
  • I just edited my question so it can be seen how I modified it. – Mystic Muffin Feb 20 '20 at 21:20