I was experimenting with the multiple target feature ahead of time code generation - static library option. I wrote a generator and was able to generate static library and header files for multiple target features like target=x86-64-windows-sse41,x86-64-windows-avx,x86-64-windows-avx2
However after linking to my application, The application crashes. When I specify only target=x86-64-windows-sse41
the application runs fine. Yes my system supports SSE4.1.
My understanding is that while compiling for multiple targets, Halide would check the feature support at runtime and call the appropriate specilization.
I did a dumpbin /All mylib.lib /out:mylib.txt
and found symbols for sse41,avx and avx2. It also has External | halide_can_use_target_features
Looks like I am missing some step. Any pointers on how to use this functionality ?
Thanks
Update
Here is what My processor supports - Extract from Coreinfo utilities
Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
Intel64 Family 6 Model 37 Stepping 1, GenuineIntel
Microcode signature: 00000428
FPU * Implements i387 floating point instructions
MMX * Supports MMX instruction set
MMXEXT - Implements AMD MMX extensions
3DNOW - Supports 3DNow! instructions
3DNOWEXT - Supports 3DNow! extension instructions
SSE * Supports Streaming SIMD Extensions
SSE2 * Supports Streaming SIMD Extensions 2
SSE3 * Supports Streaming SIMD Extensions 3
SSSE3 * Supports Supplemental SIMD Extensions 3
SSE4a - Supports Streaming SIMDR Extensions 4a
SSE4.1 * Supports Streaming SIMD Extensions 4.1
SSE4.2 * Supports Streaming SIMD Extensions 4.2
AES * Supports AES extensions
AVX - Supports AVX intruction extensions
FMA - Supports FMA extensions using YMM state
MSR * Implements RDMSR/WRMSR instructions
MTRR * Supports Memory Type Range Registers
XSAVE - Supports XSAVE/XRSTOR instructions
OSXSAVE - Supports XSETBV/XGETBV instructions
RDRAND - Supports RDRAND instruction
RDSEED - Supports RDSEED instruction
These are the order of targets I have tried.
Does Not Work
- SSE41, AVX, AVX2
- SSE41, AVX2, AVX
- AVX2, SSE41, AVX
- AVX, SSE41, AVX2
Works
- AVX2, AVX, SSE41
- AVX, AVX2, SSE41