5

When working with Metal Shaders / Compute Kernels on iOS or MacOS...

MTLComputePipelineState has a limit of maxTotalThreadsPerThreadgroup.

This limit can be queried after the pipeline state is created. This limit is dependent on both GPU hardware characteristics, OS version, and your Metal kernel code.

  • What aspects of Metal kernel code impact MTLComputePipelineState's maxTotalThreadsPerThreadgroup?
  • What can be done to increase the value given a fixed hardware / OS combination?

For example:

  • Register usage?
  • Length of code?
  • Forced inlining?

(The question isn't how to calculate the optimal sizes, it's about how to modify code to achieve the largest threadgroup.)

Link to Apple's docs for MTLComputePipelineState: https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414927-maxtotalthreadsperthreadgroup

Link to Apple's docs for "Calculating Threadgroup and Grid Sizes": https://developer.apple.com/documentation/metal/calculating_threadgroup_and_grid_sizes?language=objc

TJez
  • 1,969
  • 2
  • 19
  • 24

0 Answers0