0

I'm using conan to build a library that uses arrow parquet. I built arrow myself because I couldn't find versions in conan center that included parquet:

In my conanfile.txt

[options]
arrow:shared=True  # I tried both shared and static
arrow:parquet=True
arrow:with_snappy=True
conan install .. --build=arrow

It builds and executes properly in my machine, but fails the tests in the Jenkins server with

 SIGILL - Illegal instruction signal

From this and this posts, it seems like there could be an architecture conflict. And indeed, there are differences:

Jenkins server

AVX supported
AVX2 not supported

my computer

AVX supported
AVX2 supported

Furthermore, the arrow code has optimizations up to the avx level. For example, in byte_stream_split.h:

#if defined(ARROW_HAVE_AVX2)
template <typename T>
void ByteStreamSplitDecodeAvx2(const uint8_t* data, int64_t num_values, int64_t stride,
                               T* out)
// Code

Since I didn't add support for AVX2, how do I tell conan to build arrow without AVX2 support, or whatever the minimum common configuration might be?

Or is there something entirely different I should be looking at?

JACH
  • 996
  • 11
  • 20

1 Answers1

3

In arrow, the level of SIMD instructions used is controlled by these cmake options

  define_option_string(ARROW_SIMD_LEVEL
                       "Compile-time SIMD optimization level"
                       "SSE4_2" # default to SSE4.2
                       "NONE"
                       "SSE4_2"
                       "AVX2"
                       "AVX512")

  define_option_string(ARROW_RUNTIME_SIMD_LEVEL
                       "Max runtime SIMD optimization level"
                       "MAX" # default to max supported by compiler
                       "NONE"
                       "SSE4_2"
                       "AVX2"
                       "AVX512"
                       "MAX")

This option is used here, to check whether or not to pass in the preprocessor definitions

  if(CXX_SUPPORTS_AVX2 AND ARROW_RUNTIME_SIMD_LEVEL MATCHES "^(AVX2|AVX512|MAX)$")
    set(ARROW_HAVE_RUNTIME_AVX2 ON)
    add_definitions(-DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_BMI2)
  endif()

You can specify this cmake option, ARROW_SIMD_LEVEL and ARROW_RUNTIME_SIMD_LEVEL via the cmake generator when you run cmake. If that doesn't work, it means arrow doesn't yet support it as a cmake config through conan, so you might need to alter your build flow to be able to manually run cmake

Josh Weinstein
  • 2,788
  • 2
  • 21
  • 38
  • Thanks. It seemed like setting ARROW_RUNTIME_SIMD_LEVEL as an environment variable or conan line option should do the trick, but it doesn't. I'll need to investigate even more about how conan sends options to cmake. – JACH Jul 27 '21 at 20:15
  • Hi! If you have a look at the recipe for `arrow` package (https://github.com/conan-io/conan-center-index/blob/master/recipes/arrow/all/conanfile.py), you can see that we list the options as attributes (`options` and `default_options`) and then we add the value to the proper CMake variable using `self._cmake.definitions["..."] = `. I encourage you to contribute the changes via pull-request. – jgsogo Aug 05 '21 at 16:14
  • It seems that the defaults have changed since this post was written. In Arrow 9.0.0, `ARROW_SIMD_LEVEL` defaults to `DEFAULT` which does the right thing on ARM as well: [source](https://github.com/apache/arrow/blob/ea6875fd2a3ac66547a9a33c5506da94f3ff07f2/cpp/cmake_modules/DefineOptions.cmake#L122). `DEFAULT` should probably be used in place of `SSE4_2` or `NONE` for most use cases. – Thomas Oct 25 '22 at 08:57