0

MTLTexture deep copy artefacts

I am receiving YUV 420 CMSampleBuffers of the screen in my System Broadcast Extension, however when I attempt to access the underlying bytes, I get inconsistent results: artefacts that are a mixture of (it seems) past and future frames. I am accessing the bytes in order to rotate portrait frames a quarter turn to landscape, but the problem reduces to not being able to correctly copy the texture.

The pattern of artefacts can change quite a lot. They can be all over the place and seem to have a fundamental "brush shape" that is square tile, sometimes small, sometimes large, which seems to depend on the failing work around at hand. They can occur in both the luminance and chroma channels, which results in interesting effects. The "grain" of the artefacts sometimes appears to be horizontal, which I guess is vertical in the original frame.

I do have two functioning work arounds:

  • rotate the buffers using Metal
  • rotate the buffers using CoreImage (even a "software" CIContext works)

The reason that I can't yet ship these workarounds is that System Broadcast Extensions have a very low memory limit of 50MB and memory usage can spike with these two solutions, and there seem to be interactions with other parts of the system (e.g. the AVAssetWriter or the daemon that dumps frames into my address space). I'm still working to understand memory usage here.

The artefacts seem like a synchronisation problem. However I have a feeling that this is not so much a new frame being written into the buffer that I'm looking at, but rather some sort of stale cache. CPU or GPU? Do GPUs have caches? The tiled nature of the artefacts reminds me of iOS GPUs, but that with a grain of salt (not a hardware person).

This brings me around to the question title. If this is a caching problem, and Metal / CoreImage has a consistent view of the pixels, maybe I can get Metal to flush the data I want for me, because an BGRA screen capture being converted to YUV IOSurface has Metal shader written all over it.

So I took the incoming CMSampleBuffer's CVPixelBuffer's IOSurface and created an MTLTexture from it (with all sorts of cacheModes and storageModes, haven't tried hazardTrackingModes yet) and then copied the bytes out with MTLTexture.getBytes(bytesPerRow:from:mipmapLevel:).

Yet the problem persists. I would really like to make the CPU deep copy approach work, for memory reasons.

To head off some questions:

  • it's not a bytes-per-row issue, that would slant the images
  • in the cpu case I do lock the CVPixelBuffer's base address
  • I even lock the the underlying IOSurface
  • I have tried discarding IOSurfaces whose lock seed changes under lock
  • I do discard frames when necessary
  • I have tried putting random memory fences and mutexes all over the place (not a hardware person)
  • I have not disassembled CoreImage yet

This question is the continuation of one a posted on the Apple Developer Forums

Smaller tile artefact

Art by https://twitter.com/artofzara

Rhythmic Fistman
  • 34,352
  • 5
  • 87
  • 159
  • Could you please post the code you are using to rotate the buffer with some context? – Frank Rupprecht Jan 24 '20 at 08:37
  • I admittedly don't know the background here, but one usually doesn't rotate frames in memory but rather specify the orientation in metadata. The player will then use the metadata to display the video correctly. – Frank Rupprecht Jan 24 '20 at 08:39
  • I'll see if I can post some code (I'm working on a bug report), and I can't set orientation metadata because I'm writing to a .ts file which doesn't support that. However I want the question to be about memory consistency, and not rotation - I can't correctly copy/view the image. Yet `AVAssetWriter`, `Metal` and `CoreImage` can. How are they doing that? – Rhythmic Fistman Jan 25 '20 at 00:14
  • Are you doing your processing somehow async? I'm asking because the docs of `processSampleBuffer` state that "The sample buffer passed to this method is available only until the method returns. You shouldn't keep a reference to the sample buffer after the method returns." So you'd have to do all your processing _before_ you return from the method. Otherwise, the buffers might get re-used and overwritten. – Frank Rupprecht Jan 25 '20 at 15:23
  • I initially had been doing async buffer processing and removed it, but it didn't help. But thank you for pointing out the documentation, I had missed that, I'll make sure that's clear in the code. I'm 90% convinced that the parts of the image that the CPU is missing are sitting in some GPU cache. – Rhythmic Fistman Jan 28 '20 at 04:10

0 Answers0