3

I've seen a lot of other's online tutorial that are able to achieve 0.0X seconds mark on filtering an image. Meanwhile my code here took 1.09 seconds to filter an image.(Just to reduce brightness by half).

edit after first comment time measured with 2 methods

  • Date() timeinterval , when the button “apply filter” tapped and after the apply filter function is done running
  • build it on iphone and count manually with my timer on my watch

Since I'm new to metal & kernel stuff, I don't really know the difference between my code and those tutorials that achieve faster result. Which part of my code can be improved/ use different approach to make it a lot faster.

here's my kernel code

#include <metal_stdlib>
using namespace metal;
kernel void black(
               texture2d<float, access::write> outTexture [[texture(0)]],
               texture2d<float, access::read> inTexture [[texture(1)]],
               uint2 id [[thread_position_in_grid]]) {
    float3 val = inTexture.read(id).rgb;
    float r = val.r / 4;
    float g = val.g / 4;
float b = val.b / 2;
float4 out = float4(r, g, b, 1.0);
outTexture.write(out.rgba, id);
}

this is my swift code

import Metal
import MetalKit

 // UIImage -> CGImage -> MTLTexture -> COMPUTE HAPPENS |
 //                 UIImage <- CGImage <- MTLTexture <--
 class Filter {

var device: MTLDevice
var defaultLib: MTLLibrary?
var grayscaleShader: MTLFunction?
var commandQueue: MTLCommandQueue?
var commandBuffer: MTLCommandBuffer?
var commandEncoder: MTLComputeCommandEncoder?
var pipelineState: MTLComputePipelineState?

var inputImage: UIImage
var height, width: Int

// most devices have a limit of 512 threads per group
let threadsPerBlock = MTLSize(width: 32, height: 32, depth: 1)

init(){
    
    print("initialized")
    self.device = MTLCreateSystemDefaultDevice()!
    print(device)
    
    //changes:  I did do catch try, and use bundle parameter when making make default library
    
    let frameworkBundle = Bundle(for: type(of: self))
    print(frameworkBundle)
    
   
    self.defaultLib = device.makeDefaultLibrary()
   

    
    
    self.grayscaleShader = defaultLib?.makeFunction(name: "black")
    self.commandQueue = self.device.makeCommandQueue()
    
    self.commandBuffer = self.commandQueue?.makeCommandBuffer()
    self.commandEncoder = self.commandBuffer?.makeComputeCommandEncoder()
    
    
    //ERROR HERE
    if let shader = grayscaleShader {
        print("in")
        self.pipelineState = try? self.device.makeComputePipelineState(function: shader)
        
    } else { fatalError("unable to make compute pipeline") }
    
    self.inputImage = UIImage(named: "stockImage")!
    self.height = Int(self.inputImage.size.height)
    self.width = Int(self.inputImage.size.width)
    
}

func getCGImage(from uiimg: UIImage) -> CGImage? {
    
    UIGraphicsBeginImageContext(uiimg.size)
    uiimg.draw(in: CGRect(origin: .zero, size: uiimg.size))
    let contextImage = UIGraphicsGetImageFromCurrentImageContext()
    UIGraphicsEndImageContext()
    
    return contextImage?.cgImage
    
}

func getMTLTexture(from cgimg: CGImage) -> MTLTexture {
    
    let textureLoader = MTKTextureLoader(device: self.device)
    
    do{
        let texture = try textureLoader.newTexture(cgImage: cgimg, options: nil)
        let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: texture.pixelFormat, width: width, height: height, mipmapped: false)
        textureDescriptor.usage = [.shaderRead, .shaderWrite]
        return texture
    } catch {
        fatalError("Couldn't convert CGImage to MTLtexture")
    }
    
}

func getCGImage(from mtlTexture: MTLTexture) -> CGImage? {
    
    var data = Array<UInt8>(repeatElement(0, count: 4*width*height))
    
    mtlTexture.getBytes(&data,
                        bytesPerRow: 4*width,
                        from: MTLRegionMake2D(0, 0, width, height),
                        mipmapLevel: 0)
    
    let bitmapInfo = CGBitmapInfo(rawValue: (CGBitmapInfo.byteOrder32Big.rawValue | CGImageAlphaInfo.premultipliedLast.rawValue))
    
    let colorSpace = CGColorSpaceCreateDeviceRGB()
    
    let context = CGContext(data: &data,
                            width: width,
                            height: height,
                            bitsPerComponent: 8,
                            bytesPerRow: 4*width,
                            space: colorSpace,
                            bitmapInfo: bitmapInfo.rawValue)
    
    return context?.makeImage()
}

func getUIImage(from cgimg: CGImage) -> UIImage? {
    return UIImage(cgImage: cgimg)
}

func getEmptyMTLTexture() -> MTLTexture? {
    
    let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
        pixelFormat: MTLPixelFormat.rgba8Unorm,
        width: width,
        height: height,
        mipmapped: false)
    
    textureDescriptor.usage = [.shaderRead, .shaderWrite]
    
    return self.device.makeTexture(descriptor: textureDescriptor)
}

func getInputMTLTexture() -> MTLTexture? {
    if let inputImage = getCGImage(from: self.inputImage) {
        return getMTLTexture(from: inputImage)
    }
    else { fatalError("Unable to convert Input image to MTLTexture") }
}

func getBlockDimensions() -> MTLSize {
    let blockWidth = width / self.threadsPerBlock.width
    let blockHeight = height / self.threadsPerBlock.height
    return MTLSizeMake(blockWidth, blockHeight, 1)
}

func applyFilter() -> UIImage? {
    print("start")
    let date = Date()
    print(date)
    
    if let encoder = self.commandEncoder, let buffer = self.commandBuffer,
        let outputTexture = getEmptyMTLTexture(), let inputTexture = getInputMTLTexture() {
        
        encoder.setTextures([outputTexture, inputTexture], range: 0..<2)
        encoder.setComputePipelineState(self.pipelineState!)
        encoder.dispatchThreadgroups(self.getBlockDimensions(), threadsPerThreadgroup: threadsPerBlock)
        encoder.endEncoding()
        
        buffer.commit()
        buffer.waitUntilCompleted()
        
        guard let outputImage = getCGImage(from: outputTexture) else { fatalError("Couldn't obtain CGImage from MTLTexture") }
        
        print("stop")
        
        let date2 = Date()
        print(date2.timeIntervalSince(date))
        return getUIImage(from: outputImage)
        
    } else { fatalError("optional unwrapping failed") }
    
}


}
  • 2
    You should watch this session: https://developer.apple.com/videos/play/wwdc2021/10153/. Also, how are you measuring the time it takes to run the filter? – JustSomeGuy Aug 23 '21 at 15:32
  • manually with my watch, and also with date() timeinterval when I tap the button and the filter function finished and image is changed – Abigail Aryaputra Aug 23 '21 at 23:22
  • I don't use M1 mac. I'm not sure if I relate to the wwdc session you reffered to.. – Abigail Aryaputra Aug 23 '21 at 23:32
  • You are using `UIImage`, you are using an iPhone and there's a high chance that it has A11 or greater, which means that almost everything in the talk applies. I'd say that the problem is that you are copying too much bytes around. First, you create a `CGImage` from a `UIImage`, then blit out the `MTLTexture` to that `CGImage` and make a `UIImage` back out of it. That sounds like a bottleneck. You should try to do those modifications in place, something like in this article: https://medium.com/@s1ddok/combine-the-power-of-coregraphics-and-metal-by-sharing-resource-memory-eabb4c1be615 – JustSomeGuy Aug 24 '21 at 00:21
  • But also, measuring performance with a watch is bad. It doesn't tell you anything about what actually takes up time. Try to use Instruments and also query the `MTLCommandBuffer` for it's GPU time. See how long each of the pieces of the puzzle takes up and iterate on that. – JustSomeGuy Aug 24 '21 at 00:26
  • And yes, you should measure before optimizing and each time you change something, if you think it will affect the performance. Otherwise, you won't be able to tell which of the changes made the runtime better. – JustSomeGuy Aug 24 '21 at 00:26
  • Maybe you do two changes and one of them makes it run 3x faster, and the other one makes it 2x slower. But in the end, it will look like it's running 1.5x faster, so you might think that both those changes were positive. Anyway, you get the idea. – JustSomeGuy Aug 24 '21 at 00:27
  • If profiling seems like too much, you can start with using `os_log` and signposts to ouput a log of your app at least and measure time like this, but I would go straight for the Instruments, cause it should be pretty straightforward to use – JustSomeGuy Aug 24 '21 at 02:12
  • 1
    I actually found a different approach which is make it as CIFilter. It works pretty fast and easy to undestand. However I think there's downside to it though but I haven't found it yet. – Abigail Aryaputra Aug 24 '21 at 03:30
  • 1
    Thank you for advice, will definitely take it into consideration. Also I now learn how to time correctly! – Abigail Aryaputra Aug 24 '21 at 03:31
  • Yeah, I totally forgot about `CIFilter` – JustSomeGuy Aug 24 '21 at 16:52

2 Answers2

0

In case someone still need the answer, I found a different approach which is make it as custom CIFilter. It works pretty fast and super easy to undestand!

0

You using UIImage, CGImage. These objects stored in CPU memory.

Need implement code with using just CIImage or MTLTexture. These object are storing in GPU memory and have best performace.