5

Question in short

Why am I getting undefined behavior from simd_min and simd_max functions in Metal 2.1 with floats?

Update: Seems this only occurs on the Radeon Pro 560X GPU, but not on the Intel UHD Graphics 630.

Background

According to Metal Shading Language Guide Section 5.14, simd_min and simd_max functions are supported for common scalar or vector, integer or floating-point types.

For simd_max, the specification states:

T simd_max(T data)

Returns the maximum value in data across all active threads in the SIMD-group and broadcasts the result to all active threads in the SIMD-group.

Test Case

To test this, I am executing the following test kernel with an input buffer of 128 random floats in range 0..100:

kernel void simdMaxDebugKernel(
                          const device float *buffer [[ buffer(0) ]],
                          device float *output [[ buffer(1) ]],
                          uint id [[ thread_position_in_grid ]])
{
    output[id] = simd_max(buffer[id]);
}

By inspection, the 128-valued buffer is divided into two 64-value SIMD groups. Therefore, I would expect that the first and last 64 values in the output would be set to the max value of the first and last SIMD groups respectively.

Test Results

I'm getting some unexpected results:

inputs  [simd_float1]   128 values  
[0] Float   94.3006362
[1] Float   98.1107177
[2] Float   85.3725891
[3] Float   45.1457863
...
[63] Float  36.5486336
[64] Float  56.5494308
[65] Float  45.6249847
[66] Float  34.8077431

actual  [simd_float1]   128 values  
[0] Float   94.3006362
[1] Float   NaN
[2] Float   -3.80461845E+20
[3] Float   0.0000000000000000000000000000000000000212763294
...
[63] Float  0
[64] Float  56.5494308
[65] Float  -2467.3457
[66] Float  0.0000000000010178117
...

expectedMax simd_float1 99.4676971

Seems to me that the value at the first SIMD lane for each SIMD group is just copied, and the rest is undefined.

By contrast, the kernel behaves as expected, if a conversion to uint is used as follows:

output[id] = (float)simd_max((uint)buffer[id]);

actual  [simd_float1]   128 values  
[0] Float   99
[1] Float   99
[2] Float   99
...
[63] Float  99
[64] Float  96
[65] Float  96
...

Test Configuration

  • Mac OS 10.14.2 (18C54)
  • MacBook Pro (15-inch, 2018)
  • Radeon Pro 560X 4096 MB
  • XCode Version 10.1 (10B61)
TToi
  • 133
  • 5
  • 1
    Have you tried this on your Intel GPU to see if it behaves the same? – ldoogy Dec 12 '18 at 04:46
  • @Idoogy, good point. Tried now and looks like no problem with the Intel GPU with both the simd_min or simd_max, although the SIMD-Group size is 32 on the Intel. So there is a HW dependency on this feature. I'm wondering if it's somehow possible to link this to a GPU specification to see which GPUs are supported? – TToi Dec 12 '18 at 19:06
  • This sounds like a legit AMD driver / compiler bug. Could you please file a bug report and post the bug number here so I can follow up? – ldoogy Dec 13 '18 at 03:35
  • 1
    Filed a driver defect report on AMD. Didn't get a defect id though. – TToi Dec 14 '18 at 12:48

0 Answers0