How do you synchronize a Metal Performance Shader with an MTLBlitCommandEncoder?

Question

I'm trying to better understand the synchronization requirements when working with Metal Performance Shaders and an MTLBlitCommandEncoder.

I have an MTLCommandBuffer that is set up as follows:

Use MTLBlitCommandEncoder to copy a region of Texture A into Texture B. Texture A is larger than Texture B. I'm extracting a "tile" from Texture A and copying it into Texture B.
Use an MPSImageBilinearScale metal performance shader with Texture B as the source texture and a third texture, Texture C, as the destination. This metal performance shader will scale and potentially translate the contents of Texture B into Texture C.

How do I ensure that the blit encoder completely finishes copying the data from Texture A to Texture B before the metal performance shader starts trying to scale Texture B? Do I even have to worry about this or does the serial nature of a command buffer take care of this for me already?

Metal has the concept of fences using MTLFence for synchronizing access to resources, but I don't see anyway to have a metal performance shader wait on a fence. (Whereas waitForFence: is present on the encoders.)

If I can't use fences and I do need to synchronize, is the recommended practice to just enqueue the blit encoder, then call waitUntilCompleted on the command buffer before enqueue the shader and calling waitUntilCompleted a second time? ex:

id<MTLCommandBuffer> commandBuffer;

// Enqueue blit encoder to copy Texture A -> Texture B
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];
[blitEncoder copyFromTexture:...];
[blitEncoder endEncoding];

// Wait for blit encoder to complete.
[commandBuffer commit];
[commandBuffer waitUntilCompleted];

// Scale Texture B -> Texture C
MPSImageBilinearScale *imageScaleShader = [[MPSImageBilinearScale alloc] initWithDevice:...];  
[imageScaleShader encodeToCommandBuffer:commandBuffer...];

// Wait for scaling shader to complete.
[commandBuffer commit];
[commandBuffer waitUntilCompleted];

The reason I think I need to do the intermediary copy into Texture B is because MPSImageBilinearScale appears to scale its entire source texture. The clipOffset is useful for output, but it doesn't apply to the actual scaling or transform. So the tile needs to be extracted from Texture A into Texture B that is the same size as the tile itself. Then the scaling and transform will "make sense". Disregard this footnote because I had forgotten some basic math principles and have since figured out how to make the scale transform's translate properties work with the clipRect.

score 6 · Accepted Answer · answered Aug 24 '18 at 18:44

6

Metal takes care of this for you. The driver and GPU execute commands in a command buffer as though in serial fashion. (The "as though" allows for running things in parallel or out of order for efficiency, but only if the result would be the same as when done serially.)

Synchronization issues arise when both the CPU and GPU are working with the same objects. Also with presenting textures on-screen. (You shouldn't be rendering to a texture that's being presented on screen.)

There's a section of the Metal Programming Guide which deals with read-write access to resources by shaders, which is not exactly the same, but should reassure you:

Memory Barriers

Between Command Encoders

All resource writes performed in a given command encoder are visible in the next command encoder. This is true for both render and compute command encoders.

Within a Render Command Encoder

For buffers, atomic writes are visible to subsequent atomic reads across multiple threads.

For textures, the textureBarrier method ensures that writes performed in a given draw call are visible to subsequent reads in the next draw call.

Within a Compute Command Encoder

All resource writes performed in a given kernel function are visible in the next kernel function.

answered Aug 24 '18 at 18:44

Ken Thomases

88,520
7
116
154

Thanks Ken. I was reasonably confident that sequential Blit, Compute and Render command encoders would be safe to use, but I wasn't sure if that conclusion applied to Metal Performance Shaders as well since they don't inherit from `MTLCommandEncoder` but rather `MPSKernel`. I guess the `encodeToCommandBuffer:` methods on `MPSUnaryImageKernel` fall under one of the Memory Barrier sections you quote above. – kennyc Aug 24 '18 at 19:07
2

`encodeToCommandBuffer:` internally creates its own compute command encoder, so the first point above applies to any subsequent work. – warrenm Aug 24 '18 at 20:56
is this still true when using `MPSTemporaryImage` or `MTLHeap`-based resources? I'm trying blit results from `MPSTemporaryImage` that acts as destination of `MPSGaussianBlur` with not luck. It works when I replace that `MPSTemporaryImage` with a regular `MTLTexture`. @warrenm @ken thomases – s1ddok Jan 10 '19 at 10:27
MPSTemporaryImages are heap based resources and so the answer provided here won't work. You will need to use MTLEvents, I think. Possibly inserting a dummy compute command encoder to add the fence you need might work -- haven't tried. A bug report with Apple asking for fence support in MPS would be timely. Vote early and often! – Ian Ollmann Apr 16 '19 at 21:16

score 0 · Answer 2 · answered Jun 25 '20 at 14:57

MPS sits on top of Metal (mostly). It doesn’t replace it (mostly). You may assume that it is using the usual command encoders that you are using.

There are a few areas where MTLFences are required, particularly when interoperating with render encoders and MTLHeaps. When available, make use of the synchronize methods on the MPSImages and buffer types rather than rolling your own.

How do you synchronize a Metal Performance Shader with an MTLBlitCommandEncoder?

2 Answers2

Memory Barriers

Linked