16

When working with Metal, I find there's a bewildering number of types and it's not always clear to me which type I should be using in which context.

In Apple's Metal Shading Language Specification, there's a pretty clear table of which types are supported within a Metal shader file. However, there's plenty of sample code available that seems to use additional types that are part of SIMD. On the macOS (Objective-C) side of things, the Metal types are not available but the SIMD ones are and I'm not sure which ones I'm supposed to be used.

For example:

In the Metal Spec, there's float2 that is described as a "vector" data type representing two floating components.

On the app side, the following all seem to be used or represented in some capacity:

  • float2, which is typedef ::simd_float2 float2 in vector_types.h

    Noted: "In C or Objective-C, this type is available as simd_float2."


  • vector_float2, which is typedef simd_float2 vector_float2

    Noted: "This type is deprecated; you should use simd_float2 or simd::float2 instead"


  • simd_float2, which is typedef __attribute__((__ext_vector_type__(2))) float simd_float2

  • ::simd_float2 and simd::float2 ?

A similar situation exists for matrix types:

  • matrix_float4x4, simd_float4x4, ::simd_float4x4 and float4x4,

Could someone please shed some light on why there are so many typedefs with seemingly overlapping functionality? If you were writing a new application today (2018) in Objective-C / Objective-C++, which type should you use to represent two floating values (x/y) and which type for matrix transforms that can be shared between app code and Metal?

kennyc
  • 5,490
  • 5
  • 34
  • 57

2 Answers2

26

The types with vector_ and matrix_ prefixes have been deprecated in favor of those with the simd_ prefix, so the general guidance (using float4 as an example) would be:

  • In C code, use the simd_float4 type. (You have to include the prefix unless you provide your own typedef, since C doesn't have namespaces.)
  • Same for Objective-C.
  • In C++ code, use the simd::float4 type, which you can shorten to float4 by using namespace simd;.
  • Same for Objective-C++.
  • In Metal code, use the float4 type, since float4 is a fundamental type in the Metal Shading Language [1].
  • In Swift code, use the float4 type, since the simd_ types are typealiased to shorter names.
  • Update: In Swift 5, float4 and related types have been deprecated in favor of SIMD4<Float> and related types.

These types are all fundamentally equivalent, and all have the same size and alignment characteristics so you can use them across languages. That is, in fact, one of the design goals of the simd framework.

I'll leave a discussion of packed types to another day, since you didn't ask.

[1] Metal is an unusual case since it defines float4 in the global namespace, then imports it into the metal namespace, which is also exported as the simd namespace. It additionally aliases float4 as vector_float4. So, you can use any of the above names for this vector type (except simd_float4). Prefer float4.

warrenm
  • 31,094
  • 6
  • 92
  • 116
  • I did indeed leave out any mention of packed types so as not to overwhelm the original question. My understanding is that packed types are best used in vertex buffers to help reduce the overall memory requirements. (i.e.: When loading a large object with many vertices.) But I haven't explored them much further. Not sure if it's better to open a new question or just append to this one. (Oh, and thanks for MBE! A great resource to the community.) – kennyc Aug 11 '18 at 11:19
  • 2
    Having to type `SIMD4` each time instead of `float4` feels like a set back... – Nicolas Miari Sep 26 '19 at 05:43
  • 1
    Hi @warrenm. This is quite helpful, and yet I remain perplexed. It is typical, when coding for Metal, to have a shared header that vertex struct layouts, and that gets included by both your .metal file and your non-Metal sources. When your non-Metal sources are pure C, you must use `simd_float4`, but `simd_float4` is the one variant that specifically does *not* work in Metal, as you note above. What is one supposed to do?? I ended up using `float4` in my shared header, and doing `typedef simd_float4 float4` in the C file before I include that header, but that feels like an ugly hack...?? – bhaller Jun 03 '20 at 16:11
  • The really weird thing is that for several days I was using `simd_float4` in my shared header and it worked fine, no complaints from Metal at all. Then suddenly that code, which I hadn't touched, generated compile errors because Metal didn't like `simd_float4` any more. I have no idea why; I had made no change to my code that seems related at all, I was working on wiring up a nib! – bhaller Jun 03 '20 at 16:14
  • Sorry, note that above I meant to say "pure Objective-C" (as opposed to ObjC++), not "pure C". But the confusion remains the same. – bhaller Jun 03 '20 at 16:53
  • I'm probably missing something but Metal provides `typedef float4 simd_float4;` (and many others) once a macro is set by adding `-D __HAVE_SIMD_TYPES_PREFIX__` to the `Other Metal Compiler Flags` build setting. – spinacher Jul 25 '20 at 19:12
  • Is that possible to know the code is being compiling to Metal or C code? With some flag? Under such situation, maybe it could use different typedef for different cases. – iaomw Aug 05 '20 at 20:31
  • regarding packed types, it is confusing that `typealias simd_packed_float4 = SIMD4` is thus the same as the others, according to https://developer.apple.com/documentation/simd/simd_packed_float4 – johnbakers Sep 16 '20 at 13:14
4

which type should you use to represent two floating values (x/y)

If you can avoid it, don't use a single SIMD vector to represent a single geometry x,y vector if you're using CPU SIMD.

CPU SIMD works best when you have many of the same thing in each SIMD vector, because they're actually stores in 16-byte or 32-byte vector registers where "vertical" operations between two vectors are cheap (packed add or multiply), but "horizontal" operations can mostly only be done with a shuffle + a vertical operation.

For example a vector of 4 x values and another vector of 4 y values lets you do 4 dot-products or 4 cross-products in parallel with no shuffling, so the overall throughput is significantly more dot-products per clock cycle than if you had a vector of [x1, y1, x2, y2].

See https://stackoverflow.com/tags/sse/info, and especially these slides: SIMD at Insomniac Games (GDC 2015) for more about planning your data layout and program design for doing many similar operations in parallel instead of trying to accelerate single operations.


The one exception to this rule is if you're only adding / subtracting to translate coordinates, because that's still purely a vertical operation even with an array-of-structs. And thus fine for CPU short-vector SIMD based on 16-byte vectors. (e.g. the 2nd element in one vector only interacts with the 2nd element in another vector, so no shuffling is needed.)


GPU SIMD is different, and I think has no problem with interleaved data. I'm not a GPU expert.

(I don't use Objective C or Metal, so I can't help you with the details of their type names, just what the underlying CPU hardware is good at. That's basically the same for x86 SSE/AVX, ARM NEON / AArch64 SIMD, or PowerPC Altivec. Horizontal operations are slower.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thank you for the input and additional insight. It's good to learn that there are different considerations to take into account whether the data is being processed by the CPU or GPU. Apple routinely uses a float2 or float3 in their Metal sample code to represent a vertex within a 2-D or 3-D coordinate space. In my case, I only have a handful of vertices to represent a 2-D rectangle and the only calculations being done are to convert to Metal's normalized coordinate system for rendering. – kennyc Aug 11 '18 at 11:15
  • 2
    @kennyc: If you're only ever adding / subtracting, then array-of-structs is fine for CPU SIMD. The operations are still purely vertical (i.e. 2nd element in one vector only interacts with the 2nd element in another vector.) – Peter Cordes Aug 11 '18 at 11:59