1

CoreImage's CIAreaAverage filter can easily be used to perform whole CIImage RGB color averaging. For example:

let options = [CIContextOption.workingColorSpace: kCFNull as Any]
let context = CIContext(options: options)

let parameters = [
    kCIInputImageKey: inputImage, // assume this exists
    kCIInputExtentKey: CIVector(cgRect: inputImage.extent)
]

let filter = CIFilter(name: "CIAreaAverage", parameters: parameters)

var bitmap = [Float32](repeating: 0, count: 4)
context.render(filter.outputImage!, toBitmap: &bitmap, rowBytes: 16, bounds: CGRect(x: 0, y: 0, width: 1, height: 1), format: .RGBAf, colorSpace: nil)

let rAverage = bitmap[0]
let gAverage = bitmap[1]
let bAverage = bitmap[3]
...

However supposing one does not want whole CIImage color averaging, breaking up the image into regions of interest (ROIs) by varying the input extent (see kCIInputExtentKey above), and performing CIAreaAverage filtering operations per ROI introduces many sequential steps, decreasing performance drastically. The filters cannot be chained, of course, since the output is a 4-component color average (see bitmap above). Another way of describing this might be "average downsampling".

For example, let's say you have a 1080p image (1920x1080), and you want a 10x10 color average matrix from this. You would be performing 100 CIAreaAverage operations for 100 different input extents--each corresponding to a 192x108 pixel ROI for which you wish to have R, G, B, and perhaps A, average. But this is now 100 sequential CIAreaAverage operations--not performant.

Perhaps the next thing one might think to do is some sort of parallel for loop, e.g., a DispatchQueue.concurrentPerform(iterations:, execute:) per ROI. However, I am not seeing a performance gain. (Note that CIContext is thread safe, CIFilter is not)

Logically the next idea might be to create a custom CIFilter--let's call it CIMultiAreaAverage. However, it's not obvious how to create a CIKernel that can examine a source pixel's location and map that to a particular destination pixel. You need some buffer of information such as ROI color sum or to treat the destination pixel as a buffer. The simplest thing might be to perform ROI per channel sum into a destination with integer type, and then process that once rendered to a bitmap into an average by casting to float and dividing by the number of pixels in the ROI.

I wish I had access to the source code for CIAreaAverage. To encapsulate the full functionality in the CIFilter you might have to go further and write what's really a custom Metal shader. So perhaps someone with some expertise can assist with how to accomplish this with a metal shader.

Another option might be to use vDSP/vImage to perform these ROI operations. It seems easy to create the necessary vImage_Buffers per ROI, but I'd want to make sure that's an in-place operation (probably) for performance. Then, I'm not sure which or how to apply a vDSP mean function to the vImage_Buffer, treating it like an array, if that's possible. It sounds like this might be the most performant operation.

What does SO think?

CosmicVarion
  • 116
  • 9
  • I'd recommend writing a shader. It would be pretty trivial to find and adapt a Kernel example. If you have more specific questions I would be happy to help if I can. – Jeshua Lacock Mar 18 '22 at 18:38

1 Answers1

1

Here is what Apple is doing in CIAreaAverage:

Filter graph for CIAreaAverage

I don't know why they follow two different paths, but this is what I think is happening:

The path on the left is a stepwise reduction of the input pixels into a smaller output. The kernel _areaAvg8 reduces a group of (up to) 8x8 pixels into one output pixel by calculating their average value. _areaAvg2 does the same for 2x2 pixels and _horizAvg2 for 2x1. So in multiple steps, the image is reduced, each step reducing the values of the previous step further. Until the last step produces one final pixel that contains the average of all pixels of the input.

For the right side, I assume that the CIAreaAverageProcessor is a CIImageProcessingKernel that uses Metal Performance Shaders, specifically I assume MPSImageReduceRowMean and MPSImageReduceColumnMean, to do the same. Why they have those two paths with the switch on top I do not know.

For your use case, I suggest you implement something similar to the left path, but stop somewhere in the middle, depending on the size of your desired output.

To improve performance, you can make use of the bilinear sampling that is provided by the graphics hardware basically for free: When you sample the input image at a coordinate in the middle of 4 pixels, you already get an average of these 4 color values. That means for an 8x8 reduction, you only need 4 x 4 = 16 sample operations (instead of 64). This kernel could look something like this:

extern "C" float4 areaAvg8(coreimage::sampler src, coreimage::destination dest) {
    float2 center = dest.coord() * 8.0; // assuming that src is 8x larger than dest
    float4 sum = src.sample(src.transform(center + float2(-3.0, -3.0)))
               + src.sample(src.transform(center + float2(-1.0, -3.0)))
               + src.sample(src.transform(center + float2( 1.0, -3.0)))
               + src.sample(src.transform(center + float2( 3.0, -3.0)))
               + src.sample(src.transform(center + float2(-3.0, -1.0)))
               + src.sample(src.transform(center + float2(-1.0, -1.0)))
               + src.sample(src.transform(center + float2( 1.0, -1.0)))
               + src.sample(src.transform(center + float2( 3.0, -1.0)))
               + src.sample(src.transform(center + float2(-3.0,  1.0)))
               + src.sample(src.transform(center + float2(-1.0,  1.0)))
               + src.sample(src.transform(center + float2( 1.0,  1.0)))
               + src.sample(src.transform(center + float2( 3.0,  1.0)))
               + src.sample(src.transform(center + float2(-3.0,  3.0)))
               + src.sample(src.transform(center + float2(-1.0,  3.0)))
               + src.sample(src.transform(center + float2( 1.0,  3.0)))
               + src.sample(src.transform(center + float2( 3.0,  3.0)));
    return sum / 16.0;
}
Frank Rupprecht
  • 9,191
  • 31
  • 56
  • Thanks so much for your expertise, Frank! I didn't think to examine `CI_PRINT_TREE`. That bilinear sampling performance advice will be extremely helpful, too. Sorry for my ignorance and maybe it's easier to point me to some learning resource... but where would one chain the various kernels together (`_areaAvg8`, `_areaAvg2`)? In a `CIFilter` class? – CosmicVarion Mar 21 '22 at 13:58
  • 1
    Correct, in your own subclass of `CIFilter`. You override `var outputImage: CIImage` and perform all the calculations and chaining in there. – Frank Rupprecht Mar 21 '22 at 14:04
  • One thing you might want to explore is using `vDSP_desamp` or its Swift analogue `vDSP.downsample`. You'd need to deinterleave the interleaved buffer to planar buffers, run a horizontal pass, transpose that result, and run a second decimation pass. – Flex Monkey Mar 21 '22 at 17:24
  • @FrankSchlegel not finding any good resource for understanding the coordinate spaces, unfortunately. A couple of things I've run into: Firstly, how is sampling + or - [1-3] around the center pixel sampling in between pixels in `src`? It seems like this would be getting the bilinear interpolation of a 9x9 area by accessing every other pixel in a 7x7 area (-3:3 offset around `center` in x and y is a 7x7 area). – CosmicVarion Mar 24 '22 at 15:16
  • Secondly, presumably the first coordinate in dest is (0,0) so any sampling with negative x or y offset [1-3] would go out of `src`'s bounds, I think, and also for any other edges. So, I believe it would be bilinearly interpolating including black pixels around the edges. This would I think be easily remedied by adding an offset to `center`. – CosmicVarion Mar 24 '22 at 15:17
  • Lastly, it seems like the shader is running once per `dest` pixel, but I'm not sure how the shader knows that... maybe it's a property of `CIWarpKernel` vs. `CIColorKernel`? – CosmicVarion Mar 24 '22 at 15:17