0

I'm using the following sample app that Apple provides to do some object detection.

https://developer.apple.com/documentation/vision/tracking_multiple_objects_or_rectangles_in_video

I'm trying to paste an image of a face on top of the green rectangle in the video. (Video Download Link: https://drive.google.com/file/d/1aw5L-6uBMTxeuq378Y98dZcTh6N_Y2Pf/view?usp=sharing)

So far, I'm able to detect the green rectangle from the video very consistently, but whenever I try to overlay an image, the frame just does not appear in the view.

Here's what I've tried so far:

In TrackingImageView.swift, I've added an instance variable called faceImage and I've tried adding it to the screen by adding the following code to the bottom of the draw function.

UIGraphicsBeginImageContextWithOptions(self.imageAreaRect.size, false, 0.0)

//        self.faceImage.draw(in: CGRect(origin: CGPoint.init(x: rect.minX, y: rect.minY), size: rect.size))
self.faceImage.draw(in: CGRect(x: previous.x, y: previous.y, width: polyRect.boundingBox.width, height: polyRect.boundingBox.height))
//        self.faceImage.draw(in: rect)
let newImage = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()

self.image = newImage

Then in TrackingViewController, in the function called func displayFrame(_ frame: CVPixelBuffer?, withAffineTransform transform: CGAffineTransform, rects: [TrackedPolyRect]?), I've added the following lines.

self.trackingView.faceImage = UIImage(named: "dwight1")
self.trackingView.displayImage(rect: self.trackingView.polyRects[0].boundingBox)

UPDATE, Here's another approach I tried:

This is what it says in the documentation: Use the observation’s boundingBox to determine its location, so you can update your app or UI with the tracked object’s new location. Also use it to seed the next round of tracking.

So in the function func performTracking(type: TrackedObjectType) in VisionTrackerProcessor, I added this:

delegate?.updateImage(observation.boundingBox)

And in TrackingViewController I added this:

    func updateImage(_ rect: CGRect) {
        print(rect)
        self.faceImage.frame = rect
    }

And faceImage is this:

@IBOutlet weak var faceImage: UIImageView!

When I print out the CGPoints of the rectangle where I want to place the image, I get the following output:

(0.45066666666666666, 0.5595238095238095, 0.09599999999999997, 0.16666666666666663)
(0.4521519184112549, 0.5643428802490235, 0.09600000381469731, 0.16666666666666663)
(0.4546553611755371, 0.5875609927707248, 0.09555779099464418, 0.16589893764919705)
(0.4543778896331787, 0.5984047359890408, 0.09505770206451414, 0.1650307231479221)
(0.454343843460083, 0.6052030351426866, 0.09476101398468023, 0.16451564364963112)
(0.45296874046325686, 0.6065650092230903, 0.09457258582115169, 0.16418851216634112)
(0.4510493755340576, 0.6057157728407118, 0.09507998228073117, 0.1650694105360243)
(0.4481017589569092, 0.5987161000569662, 0.09499880075454714, 0.16492846806844075)
(0.44568862915039065, 0.5735456678602431, 0.09511266946792607, 0.16512615415785048)
(0.4434205532073975, 0.5485235426161025, 0.09506692290306096, 0.16504673428005645)
(0.4413131237030029, 0.5238201141357421, 0.09566491246223452, 0.1660849147372776)
(0.4388014316558838, 0.5072469923231336, 0.09601176977157588, 0.1666870964898003)
(0.4374812602996826, 0.4967741224500868, 0.09586981534957884, 0.16644064585367835)
(0.43827009201049805, 0.48819330003526473, 0.09551617503166199, 0.1658266809251574)
(0.44115781784057617, 0.4852377573649089, 0.09499365091323853, 0.1649195247226291)
(0.4417849540710449, 0.4845396253797743, 0.0949023962020874, 0.1647610982259115)
(0.4476351737976074, 0.49016346401638455, 0.09391363859176638, 0.16304450564914275)
(0.4497058391571045, 0.49209620157877604, 0.09434010386466984, 0.16378489600287544)
(0.4514862060546875, 0.49223976135253905, 0.09459822773933413, 0.16423302756415475)
(0.454580020904541, 0.4904879252115885, 0.0949873864650726, 0.16490865283542205)
(0.4566154479980469, 0.48613760206434464, 0.09480695724487309, 0.16459540261162653)
(0.45992450714111327, 0.47563196818033854, 0.09525291323661805, 0.1653696378072103)
(0.464534330368042, 0.46896955702039933, 0.09566755294799806, 0.1660895029703776)
(0.4682444095611572, 0.4513437059190538, 0.09700422883033755, 0.16841011047363275)
(0.4709425926208496, 0.438845952351888, 0.09843692183494568, 0.17089743084377712)
(0.47597203254699705, 0.4264893849690755, 0.10058027505874634, 0.17461851967705622)
(0.48175721168518065, 0.42467672559950087, 0.10141149759292606, 0.1760616196526421)
(0.483599328994751, 0.44046991136338975, 0.10279589891433716, 0.17846510145399308)
(0.4847916603088379, 0.44517923990885416, 0.10338790416717525, 0.17949288686116532)
(0.4889643669128418, 0.45437651740180124, 0.09983686804771424, 0.17332788043551978)
(0.49118928909301757, 0.4580091264512804, 0.09644789695739747, 0.16744425031873916)
(0.4905869483947754, 0.45951224433051213, 0.09397981166839603, 0.16315938101874455)
(0.4874621868133545, 0.45792486402723526, 0.09055853486061094, 0.15721967485215932)
(0.48279714584350586, 0.4531046549479167, 0.08872739672660823, 0.1540406121148004)
(0.4783169269561768, 0.4456812964545356, 0.0860174298286438, 0.1493358188205295)
(0.4728221893310547, 0.44693773057725694, 0.084199583530426, 0.14617982440524635)
(0.471103572845459, 0.4579927232530382, 0.08219499588012691, 0.14269964430067272)
(0.4676462173461914, 0.47325596279568144, 0.08054903745651243, 0.1398420651753744)
(0.463164234161377, 0.4803483327229818, 0.07916470766067507, 0.13743872112698025)
(0.4597337245941162, 0.4865601857503255, 0.07723031044006345, 0.1340803888108995)
(0.4575923442840576, 0.4861404842800564, 0.07577759623527525, 0.13155832290649416)
(0.456453275680542, 0.48211678398980035, 0.0741972386837006, 0.12881464428371853)
(0.45630569458007814, 0.47852266099717883, 0.0741972386837006, 0.12881464428371853)
(0.45930023193359376, 0.4749870724148221, 0.0741972386837006, 0.12881464428371847)
(0.4619853973388672, 0.460075675116645, 0.0741972386837006, 0.12881464428371853)
(0.4647641658782959, 0.44653006659613714, 0.0741972386837006, 0.12881464428371858)
(0.46242194175720214, 0.43739403618706596, 0.07220322489738468, 0.1253528171115451)
(0.4625579357147217, 0.41982913547092016, 0.07062785029411311, 0.12261778513590493)
(0.46608676910400393, 0.4134985182020399, 0.06866733431816097, 0.11921412150065108)
(0.46996197700500486, 0.41352043151855467, 0.0672459602355957, 0.11674645741780598)
(0.4733128547668457, 0.42267172071668835, 0.06592562794685364, 0.11445420583089194)
(0.4805797576904297, 0.4420909881591797, 0.06590123176574703, 0.11441185209486215)
(0.48854408264160154, 0.46238810221354165, 0.06529000997543333, 0.11335069868299696)
(0.4921866416931152, 0.47235264248318143, 0.06412824392318728, 0.11133375167846682)
(0.4948731899261475, 0.481452645195855, 0.06294543147087095, 0.10928025775485567)
(0.49323139190673826, 0.48434698316786023, 0.06219365000724797, 0.10797508027818464)
(0.4935962200164795, 0.47917471991644967, 0.061773008108139016, 0.10724479887220595)
(0.49112601280212403, 0.4626174502902561, 0.06177300810813907, 0.107244798872206)
(0.48893303871154786, 0.4498925950792101, 0.06069326996803287, 0.10537025663587785)
(0.4902684688568115, 0.45128373040093317, 0.06060827970504756, 0.10522270202636719)
(0.4870577812194824, 0.45470954047309026, 0.06060827970504756, 0.10522270202636724)
(0.45066666666666666, 0.5595238095238095, 0.09599999999999997, 0.16666666666666663)
(0.45066666666666666, 0.5595238095238095, 0.09599999999999997, 0.16666666666666663)

Any help with overlaying the image on top of my detected object would be amazing. Thanks!

Andy Jazz
  • 49,178
  • 17
  • 136
  • 220
Shalin Shah
  • 8,145
  • 6
  • 31
  • 44

1 Answers1

1

Are you realising that the coordinates you get from the Vision framework are normalised ones(between 0 and 1)?. You will have to transform those to fit the size of your view.

In addition, as far as I remember, Vision coordinates start from the bottom left corner (contrary to UIKit, starting from the top- left), so you might have to flip them vertically as well(not 100% sure here).

Edit: I see you have available videoReader.affineTransform, you can give it a try modifying your CGRects using that transform.

crom87
  • 1,141
  • 9
  • 18