The following are just some ideas that i came up with while thinking about it - there might be better solutions that i'm not aware of.
1. Tag-Dispatch
Using Tag-Dispatch you can define an order in which the functions should be considered by the compiler, e.g. in this case it's
AVX2 -> SSE3 -> Neon128 -> Neon64 -> None
The first implementation that's present in this chain will be used: godbolt example
/**********************************
** functions.h *******************
*********************************/
struct SIMD_None_t {};
struct SIMD_Neon64_t : SIMD_None_t {};
struct SIMD_Neon128_t : SIMD_Neon64_t {};
struct SIMD_SSE3_t : SIMD_Neon128_t {};
struct SIMD_AVX2_t : SIMD_SSE3_t {};
struct SIMD_Any_t : SIMD_AVX2_t {};
#include "functions_unoptimized.h"
#ifdef __ARM_NEON
#include "functions_neon64.h"
#endif
#ifdef __SSE3__
#include "functions_see3.h"
#endif
// etc...
#include "functions_stubs.h"
/**********************************
** functions_unoptimized.h *******
*********************************/
inline int add(int a, int b, SIMD_None_t) {
std::cout << "NONE" << std::endl;
return a + b;
}
/**********************************
** functions_neon64.h ************
*********************************/
inline int add(int a, int b, SIMD_Neon64_t) {
std::cout << "NEON!" << std::endl;
return a + b;
}
/**********************************
** functions_neon128.h ***********
*********************************/
inline int add(int a, int b, SIMD_Neon128_t) {
std::cout << "NEON128!" << std::endl;
return a + b;
}
/**********************************
** functions_stubs.h *************
*********************************/
inline int add(int a, int b) {
return add(a, b, SIMD_Any_t{});
}
/**********************************
** main.cpp **********************
*********************************/
#include "functions.h"
int main() {
add(1, 2);
}
This would output NEON128!
, since that's the best match in this case.
Upsides:
- no
#ifdef
's needed in the implementation header files
- callers don't need to be modified
Downsides:
- You'll need to add an extra argument to each implementation
- A dispatch-function is required to supply the extra argument
(You could theretically get rid of this function by adding , SIMD_Any_t{}
everywhere you call the function, but that's a lot of work)
2. Put the functions into classes and use name lookup to pick the right function
e.g.:
struct None { inline static int add(int a, int b) { return a + b; } };
struct Neon64 : None { inline static int add(int a, int b) { return a + b; } };
struct Neon128 : Neon64 {};
struct SIMD : Neon128 {};
// Usage:
int r = SIMD::add(1, 2);
Because child classes can hide members of their base-classes this is not ambiguos. (always the most-derived class that implements the given method is the one that will be called, so you can order your implementations)
For your example it could look like this: godbolt example
#include <iostream>
/**********************************
** functions.h *******************
*********************************/
#include "functions_unoptimized.h"
#ifdef __ARM_NEON
#include "functions_neon64.h"
#else
struct SIMD_Neon64 : SIMD_None {};
#endif
#ifdef __ARM_NEON_128
#include "functions_neon128.h"
#else
struct SIMD_Neon128 : SIMD_Neon64 {};
#endif
// etc...
struct SIMD : SIMD_Neon128 {};
/**********************************
** functions_unoptimized.h *******
*********************************/
struct SIMD_None {
inline static int sub(int a, int b) {
std::cout << "NONE" << std::endl;
return a - b;
}
};
/**********************************
** functions_neon64.h ************
*********************************/
struct SIMD_Neon64 : SIMD_None {
inline static int sub(int a, int b) {
std::cout << "Neon64" << std::endl;
return a - b;
}
};
/**********************************
** functions_neon128.h ***********
*********************************/
struct SIMD_Neon128 : SIMD_Neon64 {
inline static int sub(int a, int b) {
std::cout << "Neon128" << std::endl;
return a - b;
}
};
/**********************************
** main.cpp **********************
*********************************/
#include "functions.h"
int main() {
SIMD::sub(2, 3);
}
This would output Neon128
.
Upsides:
- No
#ifdef
's needed in the implementation header files
- No dispatch function required, the compiler will automatically pick the best one
- No extra function parameters required
Downsides:
- You need to change all calls to the functions & prefix them with
SIMD::
- You need to wrap all the functions inside struct's & use inheritance, so it's a bit involved
3. Using template specializations
If you have an enum of all possible SIMD implementations, e.g.:
enum class SIMD_Type {
Min, // Dummy Value -> No Implementation found
None,
Neon64,
Neon128,
SSE3,
AVX2,
Max // Dummy Value -> Search downwards from here
};
You can use it to (recursively) walk through them until you find one that has been specialized, e.g:
template<SIMD_Type type = SIMD_Type::Max>
inline int add(int a, int b) {
constexpr SIMD_Type nextType = static_cast<SIMD_Type>(static_cast<int>(type) - 1);
return add<nextType>(a, b);
}
template<>
inline int add<SIMD_Type::Neon64>(int a, int b) {
std::cout << "NEON!" << std::endl;
return a + b;
}
Here a call to add(1, 2)
would first call add<SIMD_Type::Max>
, which in turn would call add<SIMD_Type::AVX2
, add<SIMD_Type::SSE3>
, add<SIMD_Type::Neon128>
, and then the call to add<SIMD_Type::Neon64>
would call the specialization so recursion stops here.
If you want to make this a bit more safer (to prevent long template instaciation chains) you can additionally add one specialization for each function that stops recursion if it fails to find any specialization, e.g.: godbolt example
template<>
inline int add<SIMD_Type::Min>(int a, int b) {
static_assert(SIMD_Type::Min == SIMD_Type::Min, "No implementation found!");
return {};
}
In your case it could look like this:
#include <iostream>
/**********************************
** functions.h *******************
*********************************/
enum class SIMD_Type {
Min, // Dummy Value -> No Implementation found
None,
Neon64,
Neon128,
SSE3,
AVX2,
Max // Dummy Value -> Search downwards from here
};
#include "functions_stubs.h"
#include "functions_unoptimized.h"
#ifdef __ARM_NEON
#include "functions_neon64.h"
#endif
#ifdef __SSE3__
#include "functions_see3.h"
#endif
// etc...
/**********************************
** functions_stubs.h *************
*********************************/
template<SIMD_Type type = SIMD_Type::Max>
inline int add(int a, int b) {
constexpr SIMD_Type nextType = static_cast<SIMD_Type>(static_cast<int>(type) - 1);
return add<nextType>(a, b);
}
template<>
inline int add<SIMD_Type::Min>(int a, int b) {
static_assert(SIMD_Type::Min == SIMD_Type::Min, "No implementation found!");
return {};
}
/**********************************
** functions_unoptimized.h *******
*********************************/
template<>
inline int add<SIMD_Type::None>(int a, int b) {
std::cout << "NONE" << std::endl;
return a + b;
}
/**********************************
** functions_neon64.h ************
*********************************/
template<>
inline int add<SIMD_Type::Neon64>(int a, int b) {
std::cout << "NEON!" << std::endl;
return a + b;
}
/**********************************
** functions_neon128.h *******************
*********************************/
template<>
inline int add<SIMD_Type::Neon128>(int a, int b) {
std::cout << "NEON128!" << std::endl;
return a + b;
}
/**********************************
** main.cpp **********************
*********************************/
#include "functions.h"
int main() {
add(1, 2);
}
would output NEON128!
.
Upsides:
- no #ifdef's needed in the implementation header files
- callers don't need to be modified
Downsides:
- Needs an extra dispatch function that recursively calls itself (until it hits an specialization)
- The compiler might not optimize all recursive calls (altough most compilers probably will)
Most compilers also offer you a way to force inlining for certain functions (__attribute__((always_inline))
/ __forceinline
) which you could add the the function base templates to make sure all recursive calls actually get inlined.
- Optionally needs another function to stop recursive instanciation (not strictly required, compilers will stop recursive instanciation at some point)
4. One file per function
This is by far the easiest option - just put each function (or a collection of similar functions) into a single file and do the #ifdef
's there.
That way you have all the functions & their specializations for SIMD in a single file, which should also make editing a lot easier.
e.g.:
/**********************************
** functions.h *******************
*********************************/
#include "functions_add.h"
#include "functions_sub.h"
// etc...
/**********************************
** functions_add.h ***************
*********************************/
#ifdef __SSE3__
// SSE3
int add(int a, int b) {
return a + b;
}
#elifdef __ARM_NEON
// NEON
int add(int a, int b) {
return a + b;
}
#else
// Fallback
int add(int a, int b) {
return a + b;
}
#end
/**********************************
** functions_sub.h ***************
*********************************/
#ifdef __SSE3__
// SSE3
int sub(int a, int b) {
return a - b;
}
#elifdef __ARM_NEON_128
// NEON 128
int sub(int a, int b) {
return a - b;
}
#else
// Fallback
int sub(int a, int b) {
return a - b;
}
#end
Upsides:
- The function & all of its specializations are in a single file, so figuring out which one gets called is a lot easier
- Easy to implement & maintain as long as you don't stuff too many functions into a single file
Downsides:
- Potentially lots of header files
#ifdef
's need to be repeated in each header