We are trying out dask_yarn version 0.3.0 (with dask 0.18.2)
because of the conflicts between the boost-cpp i'm running with pyarrow
version 0.10.0
We are trying to read a csv file from hdfs - however we get an error when running dd.read_csv('hdfs:///path/to/file.csv')
since it is trying to use hdfs3.
ImportError: Can not find the shared library: libhdfs3.so
From the documentation it seems that there is an option to use pyarrow .
What is the correct syntax/configuration to do so?