0

I was experimenting with the multiple target feature ahead of time code generation - static library option. I wrote a generator and was able to generate static library and header files for multiple target features like target=x86-64-windows-sse41,x86-64-windows-avx,x86-64-windows-avx2 However after linking to my application, The application crashes. When I specify only target=x86-64-windows-sse41 the application runs fine. Yes my system supports SSE4.1.

My understanding is that while compiling for multiple targets, Halide would check the feature support at runtime and call the appropriate specilization.

I did a dumpbin /All mylib.lib /out:mylib.txt and found symbols for sse41,avx and avx2. It also has External | halide_can_use_target_features

Looks like I am missing some step. Any pointers on how to use this functionality ?

Thanks


Update

Here is what My processor supports - Extract from Coreinfo utilities

Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz Intel64 Family 6 Model 37 Stepping 1, GenuineIntel Microcode signature: 00000428 FPU * Implements i387 floating point instructions MMX * Supports MMX instruction set MMXEXT - Implements AMD MMX extensions 3DNOW - Supports 3DNow! instructions 3DNOWEXT - Supports 3DNow! extension instructions SSE * Supports Streaming SIMD Extensions SSE2 * Supports Streaming SIMD Extensions 2 SSE3 * Supports Streaming SIMD Extensions 3 SSSE3 * Supports Supplemental SIMD Extensions 3 SSE4a - Supports Streaming SIMDR Extensions 4a SSE4.1 * Supports Streaming SIMD Extensions 4.1 SSE4.2 * Supports Streaming SIMD Extensions 4.2 AES * Supports AES extensions AVX - Supports AVX intruction extensions FMA - Supports FMA extensions using YMM state MSR * Implements RDMSR/WRMSR instructions MTRR * Supports Memory Type Range Registers XSAVE - Supports XSAVE/XRSTOR instructions OSXSAVE - Supports XSETBV/XGETBV instructions RDRAND - Supports RDRAND instruction RDSEED - Supports RDSEED instruction

These are the order of targets I have tried.

  1. Does Not Work

    • SSE41, AVX, AVX2
    • SSE41, AVX2, AVX
    • AVX2, SSE41, AVX
    • AVX, SSE41, AVX2
  2. Works

    • AVX2, AVX, SSE41
    • AVX, AVX2, SSE41
  • 1
    According to the comment on `compile_to_multitarget_static_library`, "`each resulting function will be considered (in order)`". So you should order them from highest to lowest, i.e. avx2, avx, sse41. – Khouri Giordano Feb 08 '17 at 19:59
  • 1
    Khouri is correct about the ordering, but that shouldn't cause a crash or correctness failure, only suboptimal performance. (For the example above, the avx and avx2 targets would never be selected, because all such machines also have sse41, and the sse41 target would be selected first.) – Steven Johnson Feb 08 '17 at 21:40
  • @KhouriGiordano your suggestion prevented the crash! – Ganesh Kumar M R Feb 09 '17 at 09:10
  • Hmm, this looks like it could be a bug in Halide -- all of these *should* work. You should file this as at https://github.com/halide/Halide/issues and let someone familiar with the code investigate. – Steven Johnson Feb 09 '17 at 17:28
  • 1
    OK, after looking at the code, I know what's going on; it's arguably not a bug, but we can still do better. What's going on is that the final target is considered the "base" (safest) target; we use that to compile some common runtime code shared by all targets. Since you'd normally prefer most-to-least-specific, e.g. avx2-avx-sse41,avx-sse41,sse41,(plain-old-x86) this meant that the common stuff was using only plain-old-x86. That said, Halide could be more resilient and ensure that the runtime is emitted with only features that belong to all of the targets. – Steven Johnson Feb 09 '17 at 17:52
  • 1
    This should be fixed by https://github.com/halide/Halide/pull/1823, but you also should fix the order of the targets you request. Also keep in mind that you are generating code that won't run at all on pre-SSE41 machines (which are a small percentage of machines these days, but still, you should realize & expect failures there). – Steven Johnson Feb 09 '17 at 18:15
  • Thanks for the support and suggestions. – Ganesh Kumar M R Feb 10 '17 at 01:05

2 Answers2

1

The multi-target feature is intended to do what you are trying to do. There is a wrapper function which calls halide_can_use_target_features and only calls a routine compiled with those features if it returns true.

Is the crash on an AVX or AVX2 instruction? Does it work if only AVX or only AVX2 is added in addition to SSE 4.1?

You can override halide_can_use_target_features by calling halide_set_custom_can_use_target_features. This should allow you to track the calls and to isolate if the bug is in the logic of that routine.

Zalman Stern
  • 3,161
  • 12
  • 18
0

What you are doing should (in theory) work just fine and not crash (though, as pointed out above, the order you have specified will produce suboptimal performance).

The first interesting question is to see what the nature of the crash is -- illegal instruction? something else? Capturing that info would be immensely helpful.

Steven Johnson
  • 266
  • 1
  • 4