On MTLComputePipelineState what determines maxTotalThreadsPerThreadgroup?

Asked Apr 19 '19 at 16:19

Active Apr 26 '19 at 10:04

Viewed 401 times

When working with Metal Shaders / Compute Kernels on iOS or MacOS...

MTLComputePipelineState has a limit of maxTotalThreadsPerThreadgroup.

This limit can be queried after the pipeline state is created. This limit is dependent on both GPU hardware characteristics, OS version, and your Metal kernel code.

What aspects of Metal kernel code impact MTLComputePipelineState's maxTotalThreadsPerThreadgroup?
What can be done to increase the value given a fixed hardware / OS combination?

For example:

Register usage?
Length of code?
Forced inlining?

(The question isn't how to calculate the optimal sizes, it's about how to modify code to achieve the largest threadgroup.)

Link to Apple's docs for MTLComputePipelineState: https://developer.apple.com/documentation/metal/mtlcomputepipelinestate/1414927-maxtotalthreadsperthreadgroup

Link to Apple's docs for "Calculating Threadgroup and Grid Sizes": https://developer.apple.com/documentation/metal/calculating_threadgroup_and_grid_sizes?language=objc

edited Apr 19 '19 at 16:28

asked Apr 19 '19 at 16:19

TJez

1,969
2
19
24

On MTLComputePipelineState what determines maxTotalThreadsPerThreadgroup?

0 Answers0