The SIMD instructions of the x86 instruction set only support 32-bit and 64-bit floating point operations (with some limited support for 16-bit floats). Additionally, even though there are 64-bit times 64-bit to 128-bit scalar integer instructions (e.g. mulx
) there are no corresponding SIMD instructions. Many people have tried and failed to implement efficient 128-bit integer x86 SIMD arithmetic (there are some exceptions for multiplication and maybe addition). There are no general x86 SIMD integer division instructions.
However, for floating point people have had more success with higher precision floating point SIMD operations using double-double. Double-double has 106-bits of precision compared with 64-bits of precision with 80-bit long double. But not every C++ compiler uses 80-bit long double. Some just use double (e.g. MSVC) which only has 54-bits of precision and some use 128-bit quad precision which has 113 bits of precision and Wikipedia even claims that with some compilers long double is implemented as double-double.
I described some details of double-double here. Note that double-double is not a IEEE floating point type and it has some unusual properties. Also, the range of double-double is the same as double so it only improves the precision.
How fast is double-double compared to long double? I have never tested this. But I found double-double to be about 10 times slower than double operations when doing a somewhat balanced mix of multiplication and addition operations. And long double is certainly slower than double (except when it's implemented as a double). But since you can use SIMD with double-double, but not with the bulit-in long double, then the speed improves proportional to the SIMD width. So 2 double-double operations with SSE2, 4 with AVX, and 8 with AVX512.
Don't expect OpenMP's simd
construction to implement double-double though. You will need to implement this yourself or find a library.