Optimize performance of rendering MTLTexture to Display?

Question

I am presenting a MTLTexture to the display with the following code. It is a large (~12k) texture that is taking about 10ms to draw. Since I need to render at least 30FPS (33ms/frame) it is taking nearly one third of my computing time just to display the texture.

Are there any tricks to increase performance and draw the texture faster? In the GPU Frame Capture it shows a lot of time for culling. I tried enabling/disabling culling but I still see the same time being used for culling.

vertex TextureMappingVertex mapTexture(unsigned int vertex_id [[ vertex_id ]], constant Uniforms& uniforms [[ buffer(0) ]]) {

    float4x4 renderedCoordinates = float4x4(float4( -1.f, -1.f, 0.f, 1.f ),
                                        float4(  1.f, -1.f, 0.f, 1.f ),
                                        float4( -1.f,  1.f, 0.f, 1.f ),
                                        float4(  1.f,  1.f, 0.f, 1.f ));

    float4x2 textureCoordinates = float4x2(float2( 0.f, 1.f ),
                                       float2( 1.f, 1.f ),
                                       float2( 0.f, 0.f ),
                                       float2( 1.f, 0.f ));
    TextureMappingVertex outVertex;
    outVertex.renderedCoordinate = uniforms.mvp_matrix * renderedCoordinates[vertex_id];
    outVertex.textureCoordinate = textureCoordinates[vertex_id];
    return outVertex;
}

fragment half4 displayTexture(TextureMappingVertex mappingVertex [[ stage_in ]],
                          texture2d<half, access::sample> texture [[ texture(0) ]]) {

    constexpr sampler s(address::clamp_to_edge, filter::nearest);
    return half4(texture.sample(s, mappingVertex.textureCoordinate));
}

func draw(in view: MTKView) {

    guard let commandBuffer = commandQueue.makeCommandBuffer() else { return }
    guard let descriptor = view.currentRenderPassDescriptor else {return}

    let render = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)
    render?.pushDebugGroup("Render to Screen")
    render?.setRenderPipelineState(renderCanvasPipelineState)
    render?.setFragmentTexture(canvasTexture, index: 0)
    render?.setVertexBuffer(uniformBuffer, offset: 0, index: 0)
    render?.setCullMode(.none)

    render?.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4, instanceCount: 1)
    render?.popDebugGroup()
    render?.endEncoding()

    guard let drawable = view.currentDrawable else { return }
    commandBuffer.present(drawable)

    commandBuffer.commit()
}

func initializeCanvasRenderPipelineState() {
    let library = Renderer.device.makeDefaultLibrary()
    let pipelineDescriptor = MTLRenderPipelineDescriptor()
    pipelineDescriptor.sampleCount = 1
    pipelineDescriptor.rasterSampleCount = 1
    pipelineDescriptor.colorAttachments[0].pixelFormat = .rgba8Unorm
    pipelineDescriptor.depthAttachmentPixelFormat = .invalid
    pipelineDescriptor.vertexFunction = library?.makeFunction(name: "mapTexture")
    pipelineDescriptor.fragmentFunction = library?.makeFunction(name: "displayTexture")

    pipelineDescriptor.colorAttachments[0].isBlendingEnabled = false

    do {
        try renderCanvasPipelineState = Renderer.device.makeRenderPipelineState(descriptor: pipelineDescriptor)
    }
    catch {
        assertionFailure("Failed creating a render state pipeline. Can't render the texture without one.")
        return
    }
}

How is `uniforms.mvp_matrix` set up? Are you always drawing the quad head-on and just scaling and translating x and y (and rotating around z?) to get the part you want? Or are you potentially rotating around the other axes? I suspect you could avoid the culling by drawing a quad that didn't extend outside the window and instead adjusting the texture coordinates to select the subsection of the texture you want to sample. — Ken Thomases, Jan 28 '20 at 23:03
Correct; I am just scaling and translating, no rotating. I can add the code if helpful. Interesting. So keep the quad stationary but translate/scale the texture coordinates? — Jeshua Lacock, Jan 28 '20 at 23:11
So I just did some experiments hardwiring the texture coords for now, and if the full texture is displayed there isn't really any increase in performance. But when I zoom in the texture to 2x, it is about twice as fast. Not sure if its worth implementing since I think most of the time and for most cases the full texture will be displayed. Any thing else I can try? — Jeshua Lacock, Jan 29 '20 at 00:03
Do you really need 3D for your use case? Or do you "just" need to display a texture (transformed) on the screen? — Frank Rupprecht, Jan 29 '20 at 07:46
Using mipmaps will make a huge difference to performance. You might see a very small gain by rendering a [single triangle](https://rauwendaal.net/2014/06/14/rendering-a-screen-covering-triangle-in-opengl/). Also, be careful how you measure on mobile devices, if you're hitting your frame rate target easily, then the hardware will throttle back CPU/GPU clock speed to save on power/heat. So absolute times are not always useful. It might be taking 10ms, but the device could do it in 1ms if necessary. — Columbo, Jan 29 '20 at 08:05
Problem is its not hitting the frame rate (unless I scale down rendering operations more than I would like). I've thought of a mipmap, but wouldn't I need to render at two resolutions - one half res and the other full res? Or can the low res be automatically created somehow? — Jeshua Lacock, Jan 29 '20 at 08:10
As I feared, generating a mipmap takes longer than it takes to just display the texture. Using only 1 lower res mip level, it is taking 15ms to generate the mipmap using a blitz encoder. Oh well, definitely worth a shot, thanks! — Jeshua Lacock, Jan 29 '20 at 09:18
If there is nothing wrong with your GPU logic in term of execution time, then the time it takes to render an image to the screen is memory read and then memory write time. There is not going to be that much you can do to optimize this since it is a very low level function on the GPU. If your image data is very simple, it might be possible to create a 256 color table that represents all the image pixels after Quantization and then render the whole image as 8bit indexes as that would be 1 byte per image instead of 4 bytes per image. — MoDJ, Jan 29 '20 at 17:13

Optimize performance of rendering MTLTexture to Display?

0 Answers0