As a follow-up question to this answer. I am trying to replace a for-loop running on CPU with a kernel function in Metal to parallelize computation and speed up performance.
My function is basically a convolution. Since I repeatedly receive new data for my input array values (the data stems from a AVCaptureSession
) it seems that using newBufferWithBytesNoCopy:length:options:deallocator:
is the sensible option for creating the MTLBuffer
objects. Here is the relevant code:
id <MTLBuffer> dataBuffer = [device newBufferWithBytesNoCopy:dataVector length:sizeof(dataVector) options:MTLResourceStorageModeShared deallocator:nil];
id <MTLBuffer> filterBuffer = [device newBufferWithBytesNoCopy:filterVector length:sizeof(filterVector) options:MTLResourceStorageModeShared deallocator:nil];
id <MTLBuffer> outBuffer = [device newBufferWithBytesNoCopy:outVector length:sizeof(outVector) options:MTLResourceStorageModeShared deallocator:nil];
When running this I get the following error:
failed assertion `newBufferWithBytesNoCopy:pointer 0x16fd0bd48 is not 4096 byte aligned.'
Right now, I am not allocating any memory, but (for testing purposes) just creating an empty array of floats of a fixed size and filling it up with random numbers. So my main question is:
How do I allocate these arrays of floats the correct way so that the following requirement is met
This value must result in a page-aligned region of memory.
Also, some additional questions:
- Does it even make sense to create the
MTLBuffer
with thenewBufferWithBytesNoCopy
method, or is copying the data not really an issue in terms of performance? (My actual data will consist of approximately 43'000 float values per video frame.) - Is
MTLResourceStorageModeShared
the correct choice forMTLResourceOptions
The API reference says
The storage allocation of the returned new MTLBuffer object is the same as the pointer input value. The existing memory allocation must be covered by a single VM region, typically allocated with vm_allocate or mmap. Memory allocated by malloc is specifically disallowed.
Does this apply only to the output buffer, or should the storage allocation for all objects used with
MTLBuffer
not be done withmalloc
?