How to normalize disparity data in iOS?

Question

In WWDC session "Image Editing with Depth" they mentioned few times normalizedDisparity and normalizedDisparityImage:

"The basic idea is that we're going to map our normalized disparity values into values between 0 and 1"

"So once you know the min and max you can normalize the depth or disparity between 0 and 1."

I tried to first get the disparit image like this:

let disparityImage = depthImage.applyingFilter(
    "CIDepthToDisparity", withInputParameters: nil)

Then I tried to get depthDataMap and do normalization but it didn't work. I'm I on the right track? would be appreciate some hint on what to do.

Edit:

This is my test code, sorry for the quality. I get the min and max then I try to loop over the data to normalize it (let normalizedPoint = (point - min) / (max - min))

let depthDataMap = depthData!.depthDataMap
let width = CVPixelBufferGetWidth(depthDataMap) //768 on an iPhone 7+
let height = CVPixelBufferGetHeight(depthDataMap) //576 on an iPhone 7+
CVPixelBufferLockBaseAddress(depthDataMap, CVPixelBufferLockFlags(rawValue: 0))
// Convert the base address to a safe pointer of the appropriate type
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthDataMap), 
    to: UnsafeMutablePointer<Float32>.self)
var min = floatBuffer[0]
var max = floatBuffer[0]
for x in 0..<width{
    for y in 0..<height{
        let distanceAtXYPoint = floatBuffer[Int(x * y)]
        if(distanceAtXYPoint < min){
            min = distanceAtXYPoint
        }
        if(distanceAtXYPoint > max){
            max = distanceAtXYPoint
        }
    }
}

What I expected is the the data will reflect the disparity where the user clicked on the image but it didn't match. The code to find the disparity where the user clicked is here:

// Apply the filter with the sampleRect from the user’s tap. Don’t forget to clamp!
let minMaxImage = normalized?.clampingToExtent().applyingFilter(
    "CIAreaMinMaxRed", withInputParameters: 
        [kCIInputExtentKey : CIVector(cgRect:rect2)])
// A four-byte buffer to store a single pixel value
var pixel = [UInt8](repeating: 0, count: 4)
// Render the image to a 1x1 rect. Be sure to use a nil color space.
context.render(minMaxImage!, toBitmap: &pixel, rowBytes: 4,
    bounds: CGRect(x:0, y:0, width:1, height:1), 
    format:  kCIFormatRGBA8, colorSpace: nil)
// The max is stored in the green channel. Min is in the red.
let disparity = Float(pixel[1]) / 255.0

“It didn’t work”... what’s your code, what did you expect, what did you get? — jcaron, Nov 05 '17 at 00:49
Hey Jimmy, digging in depthData as well. Any chance you know how to get the actual background and foreground images as UIImages? — Roi Mulia, Nov 08 '17 at 16:28
Hey @RoiMulia, using depth data you could create a mask and differentiate between foreground/background. WWDC "Image editing with depth" session has a lot of info related to that I probably watch it 100 times by now. — Jimmy, Nov 10 '17 at 03:31
Hey @Jimmy, I tried for a few days now to accomplish this but It just doesn't come together. I know that you are probably super busy, but If you have done this before, can you share the code? I'm willing to pay for it if needed, it's just frustrating as I spend so many days on it already — Roi Mulia, Nov 12 '17 at 10:02

Feras Alnatsheh · Accepted Answer · 2017-11-15T04:28:36.980

There's a new blog post on raywenderlich.com called "Image Depth Maps Tutorial for iOS" contains sample app and details related to working with depth. The sample code shows how to normalize the depth data using a CVPixelBuffer extension:

extension CVPixelBuffer {

  func normalize() {

    let width = CVPixelBufferGetWidth(self)
    let height = CVPixelBufferGetHeight(self)

    CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
    let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(self), to: UnsafeMutablePointer<Float>.self)

    var minPixel: Float = 1.0
    var maxPixel: Float = 0.0

    for y in 0 ..< height {
      for x in 0 ..< width {
        let pixel = floatBuffer[y * width + x]
        minPixel = min(pixel, minPixel)
        maxPixel = max(pixel, maxPixel)
      }
    }

    let range = maxPixel - minPixel

    for y in 0 ..< height {
      for x in 0 ..< width {
        let pixel = floatBuffer[y * width + x]
        floatBuffer[y * width + x] = (pixel - minPixel) / range
      }
    }

    CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
  }
}

Something to keep in mind when working with depth data that they are lower resolution than the actual image so you need to scale up (more info in the blog and in the WWDC video)

I've used this successfully in the past, but suddenly I'm having the issue that it fails about halfway through, complaining it can't access data in the float array. — Ash, Oct 09 '18 at 07:34
Ok, keep in mind that this function does not take into account the pixel format of the pixel buffer. If your components are not 16 bits per pixel, this isn't going to work. I had a 32-bit, single-component buffer. I'd also recommend that you use `Float.greatestFiniteMagitide` for the initial minimum size, in case all of your values are above 1, and the negative of this value for the initial maximum for similar reasons. — Ash, Nov 03 '18 at 09:02

score 2 · Answer 2 · answered Nov 10 '20 at 11:53

Will's answer above is very good, but it can be improved as follows. I'm using it with depth data from a photo, it's possible that if the depth data doesn't follow 16-bits, as mentioned above, it won't work. Haven't found such a photo yet. I'm surprised there isn't a filter to handle this in Core Image.

extension CVPixelBuffer {

func normalize() {
    CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
    
    let width = CVPixelBufferGetWidthOfPlane(self, 0)
    let height = CVPixelBufferGetHeightOfPlane(self, 0)
    let count = width * height

    let pixelBufferBase = unsafeBitCast(CVPixelBufferGetBaseAddressOfPlane(self, 0), to: UnsafeMutablePointer<Float>.self)
    let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: pixelBufferBase, count: count)

    let maxValue = vDSP.maximum(depthCopyBuffer)
    let minValue = vDSP.minimum(depthCopyBuffer)
    let range = maxValue - minValue
    let negMinValue = -minValue

    let subtractVector = vDSP.add(negMinValue, depthCopyBuffer)
    let normalizedDisparity = vDSP.divide(subtractVector, range)
    pixelBufferBase.initialize(from: normalizedDisparity, count: count)

    CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}

}

score 1 · Answer 3 · answered Oct 12 '20 at 21:21

Try using the Accelerate Framework vDSP vector functions.. here is a normalize in two functions.

to change the cvPixel buffer to a 0..1 normalized range

myCVPixelBuffer.setUpNormalize()


import Accelerate

extension CVPixelBuffer {
    func vectorNormalize( targetVector: UnsafeMutableBufferPointer<Float>) -> [Float] {
        // range = max - min
        // normalized to 0..1 is (pixel - minPixel) / range

        // see Documentation "Using vDSP for Vector-based Arithmetic" in vDSP under system "Accelerate" documentation

        // see also the Accelerate documentation section 'Vector extrema calculation'
        // Maximium static func maximum<U>(U) -> Float
        //      Returns the maximum element of a single-precision vector.

        //static func minimum<U>(U) -> Float
        //      Returns the minimum element of a single-precision vector.


        let maxValue = vDSP.maximum(targetVector)
        let minValue = vDSP.minimum(targetVector)

        let range = maxValue - minValue
        let negMinValue = -minValue

        let subtractVector = vDSP.add(negMinValue, targetVector)
            // adding negative value is subtracting
        let result = vDSP.divide(subtractVector, range)

        return result
    }

    func setUpNormalize() -> CVPixelBuffer {
        // grayscale buffer float32 ie Float
        // return normalized CVPixelBuffer

        CVPixelBufferLockBaseAddress(self,
                                     CVPixelBufferLockFlags(rawValue: 0))
        let width = CVPixelBufferGetWidthOfPlane(self, 0)
        let height = CVPixelBufferGetHeightOfPlane(self, 0)
        let count = width * height

        let bufferBaseAddress = CVPixelBufferGetBaseAddressOfPlane(self, 0)
            // UnsafeMutableRawPointer

        let pixelBufferBase  = unsafeBitCast(bufferBaseAddress, to: UnsafeMutablePointer<Float>.self)

        let depthCopy  =   UnsafeMutablePointer<Float>.allocate(capacity: count)
        depthCopy.initialize(from: pixelBufferBase, count: count)
        let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: depthCopy, count: count)

        let normalizedDisparity = vectorNormalize(targetVector: depthCopyBuffer)

        pixelBufferBase.initialize(from: normalizedDisparity, count: count)
            // copy back the normalized map into the CVPixelBuffer

        depthCopy.deallocate()
//        depthCopyBuffer.deallocate()

        CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))

        return self

    }

}

You can see it in action in a modified version of the Apple sample 'PhotoBrowse' app at

https://github.com/racewalkWill/PhotoBrowseModified

How to normalize disparity data in iOS?

3 Answers3