0

I'm trying to build and link-to apache-arrow v9.0.0 inside my cmake project using the following section in my CMakeLists.txt file.

ExternalProject_Add(arrow
        URL "https://www.apache.org/dist/arrow/arrow-9.0.0/apache-arrow-9.0.0.tar.gz"
        SOURCE_SUBDIR cpp)
message(STATUS "arrow source dir: ${arrow_SOURCE_DIR}")
include_directories(${arrow_SOURCE_DIR}/cpp/src)

The compilation complains that the apache-arrow headers are missing

fatal error: 'arrow/array.h' file not found
#include <arrow/array.h>
         ^~~~~~~~~~~~~~~
1 error generated.

supported by the fact that the output of message(STATUS "arrow source dir: ${arrow_SOURCE_DIR}") is empty

-- arrow source dir: 

Another error seemingly related to the apache-arrow installation reported by cmake is that

CMake Error at cmake_modules/ThirdpartyToolchain.cmake:267 (find_package):
  Could not find a configuration file for package "xsimd" that is compatible
  with requested version "8.1.0".

  The following configuration files were considered but not accepted:

    /opt/homebrew/lib/cmake/xsimd/xsimdConfig.cmake, version: 9.0.1

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:2245 (resolve_dependency)
  CMakeLists.txt:575 (include)

Of course, the traditional approach of installing apache-arrow externally with say brew install apache-arrow and using find_package works well enough, but I'd like something more cross-platform. One of the arrow devs had provided a link on how to properly use include_directories with ExternalProject_Add for an earlier question, but I guess that example is now outdated.

What's the recommended way of installing and then linking-to apache-arrow inside a cmake project using ExternalProject_Add?

Edit: Minimal Example

CMakeLists.txt

cmake_minimum_required(VERSION 3.24)
project(arrow_cmake)

set(CMAKE_CXX_STANDARD 23)

include(ExternalProject)

ExternalProject_Add(Arrow
        URL "https://www.apache.org/dist/arrow/arrow-9.0.0/apache-arrow-9.0.0.tar.gz"
        SOURCE_SUBDIR cpp
        CMAKE_ARGS "-Dxsimd_SOURCE=BUNDLED"
        )
add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME} arrow_shared)

main.cpp

#include <iostream>

#include <arrow/array.h> // not found!

int main() {
    std::cout << "Hello, World!" << std::endl;
    return 0;
}

marital_weeping
  • 618
  • 5
  • 18
  • Have you checked [that question](https://stackoverflow.com/questions/6351609/cmake-linking-to-library-downloaded-from-externalproject-add) about linking with a library created in `ExternalProject_Add`? While that question is about another library (`protobuf`), adapting [its solution](https://stackoverflow.com/a/29324527/3440745) for your case requires only to change paths for include directories and libraries to ones specific for Apache-arrow. – Tsyvarev Oct 03 '22 at 18:39

2 Answers2

2

Building arrow from sources in cmake took quite some doing. It's heavily influenced by this link.

cmake/arrow.cmake

# Build the Arrow C++ libraries.
function(build_arrow)
    set(one_value_args)
    set(multi_value_args)

    cmake_parse_arguments(ARG
            "${options}"
            "${one_value_args}"
            "${multi_value_args}"
            ${ARGN})
    if (ARG_UNPARSED_ARGUMENTS)
        message(SEND_ERROR "Error: unrecognized arguments: ${ARG_UNPARSED_ARGUMENTS}")
    endif ()

    # If Arrow needs to be built, the default location will be within the build tree.
    set(ARROW_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/arrow_ep-prefix")

    set(ARROW_SHARED_LIBRARY_DIR "${ARROW_PREFIX}/lib")

    set(ARROW_SHARED_LIB_FILENAME
            "${CMAKE_SHARED_LIBRARY_PREFIX}arrow${CMAKE_SHARED_LIBRARY_SUFFIX}")
    set(ARROW_SHARED_LIB "${ARROW_SHARED_LIBRARY_DIR}/${ARROW_SHARED_LIB_FILENAME}")
    set(PARQUET_SHARED_LIB_FILENAME
            "${CMAKE_SHARED_LIBRARY_PREFIX}parquet${CMAKE_SHARED_LIBRARY_SUFFIX}")
    set(PARQUET_SHARED_LIB "${ARROW_SHARED_LIBRARY_DIR}/${PARQUET_SHARED_LIB_FILENAME}")

    set(ARROW_BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/arrow_ep-build")
    set(ARROW_CMAKE_ARGS "-DCMAKE_INSTALL_PREFIX=${ARROW_PREFIX}"
            "-DCMAKE_INSTALL_LIBDIR=lib" "-Dxsimd_SOURCE=BUNDLED"
            "-DARROW_BUILD_STATIC=OFF" "-DARROW_PARQUET=ON"
            "-DARROW_WITH_UTF8PROC=OFF" "-DARROW_WITH_RE2=OFF"
            "-DARROW_FILESYSTEM=ON" "-DARROW_CSV=ON" "-DARROW_PYTHON=ON")
    set(ARROW_INCLUDE_DIR "${ARROW_PREFIX}/include")

    set(ARROW_BUILD_BYPRODUCTS "${ARROW_SHARED_LIB}" "${PARQUET_SHARED_LIB}")

    include(ExternalProject)

    externalproject_add(arrow_ep
            URL https://github.com/apache/arrow/archive/refs/tags/apache-arrow-9.0.0.tar.gz
            SOURCE_SUBDIR cpp
            BINARY_DIR "${ARROW_BINARY_DIR}"
            CMAKE_ARGS "${ARROW_CMAKE_ARGS}"
            BUILD_BYPRODUCTS "${ARROW_BUILD_BYPRODUCTS}")

    set(ARROW_LIBRARY_TARGET arrow_shared)
    set(PARQUET_LIBRARY_TARGET parquet_shared)

    file(MAKE_DIRECTORY "${ARROW_INCLUDE_DIR}")
    add_library(${ARROW_LIBRARY_TARGET} SHARED IMPORTED)
    add_library(${PARQUET_LIBRARY_TARGET} SHARED IMPORTED)
    set_target_properties(${ARROW_LIBRARY_TARGET}
            PROPERTIES INTERFACE_INCLUDE_DIRECTORIES ${ARROW_INCLUDE_DIR}
            IMPORTED_LOCATION ${ARROW_SHARED_LIB})
    set_target_properties(${PARQUET_LIBRARY_TARGET}
            PROPERTIES INTERFACE_INCLUDE_DIRECTORIES ${ARROW_INCLUDE_DIR}
            IMPORTED_LOCATION ${PARQUET_SHARED_LIB})

    add_dependencies(${ARROW_LIBRARY_TARGET} arrow_ep)
endfunction()

Use it in your CMakeLists.txt file as

...
set(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake)
include(arrow)
build_arrow()
marital_weeping
  • 618
  • 5
  • 18
1

Could you try forcing Arrow build system to download and use the bundled xsimd? I've been able to reproduce and been able to build locally.

ExternalProject_Add(Arrow
    URL "https://www.apache.org/dist/arrow/arrow-9.0.0/apache-arrow-9.0.0.tar.gz"
    SOURCE_SUBDIR cpp
    CMAKE_ARGS "-Dxsimd_SOURCE=BUNDLED"
)

I don't think we have documentation for it at the moment and we assume find_package to be used on the documentation: https://arrow.apache.org/docs/dev/cpp/build_system.html. Maybe we could open a ticket to improve the documentation in order to use it with CMake's ExternalProject or FetchContent.

raulcumplido
  • 405
  • 1
  • 3
  • 6
  • Thanks @raulcumplido. Your suggestion did get rid of the incompatible `xsimd`, but the header issue, .e.g., `` not being found persists. – marital_weeping Oct 03 '22 at 14:58
  • 1
    Are you linking your target with arrow_shared? `add_executable(my_example my_example.cc)` and `target_link_libraries(my_example PRIVATE arrow_shared)` – raulcumplido Oct 03 '22 at 15:03
  • I tried `target_link_libraries(${PROJECT_NAME} arrow_shared)` but still no headers found. – marital_weeping Oct 03 '22 at 15:06
  • I am not entirely sure at this point. I've changed the find_package to `ExternalProject_Add` here https://github.com/apache/arrow-cookbook/blob/main/cpp/code/CMakeLists.txt#L27 and can't reproduce the issue. Could you provide a minimal example that reproduces the issue? – raulcumplido Oct 03 '22 at 15:37
  • Done. Please see my question. – marital_weeping Oct 03 '22 at 15:43