0

Here is the code that I am using to connect to hdfs and create dask dataframe.

Client(scheduler_host+":"+scheduler_port)
df=dd.read_csv("hdfs://hdfs_host/<path to csv on hdfs>")

Error:

AttributeError: /usr/lib/libhdfs3.so: undefined symbol: hdfsConcat

HADOOP Version: 2.5.1

Santosh Kumar
  • 761
  • 5
  • 28
  • You do not have the latest version of libhdfs3 – mdurant Sep 22 '17 at 13:38
  • @mdurant: I have latest version libhdfs3 available for Ubuntu trusty. – Santosh Kumar Sep 23 '17 at 06:33
  • sudo apt-get install libhdfs3 libhdfs3-dev Reading package lists... Done Building dependency tree Reading state information... Done libhdfs3 is already the newest version. libhdfs3-dev is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 109 not upgraded. – Santosh Kumar Sep 23 '17 at 06:34
  • I cannot say how recent libhdfs3 is in your APT repo (you could check the version); I would normally recommend installation with conda. – mdurant Sep 23 '17 at 13:35
  • I'll look into making version-agnostic code, but I am trying to iron out other build issues in libhdfs3 at the moment. – mdurant Sep 24 '17 at 20:42

0 Answers0