1

Apologies, because I don't know enough to ask this question correctly; all I know is that I'm getting a Segmentation Fault: 11 error whenever I try to list multiple files stored on HDFS using PyArrow with the libhdfs3 driver in Python3:

Python 3.7.1 | packaged by conda-forge | (default, Nov 13 2018, 10:30:07) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin

Here is the code I'm running:

import pyarrow as pa
fs = pa.hdfs.connect('localhost', 8020, driver='libhdfs3')

This connects to HDFS fine, so I then run:

>>> fs.ls("/user/dan/", detail=False)
['/user/dan/testing'] # this directory has 2 files in it

>>> fs.ls("/user/dan/testing", detail=False)
Segmentation fault: 11

Interestingly, if I delete one of the files ...

>>> fs.ls("/user/dan/testing", detail=False)
['/user/dan/testing/C5116966@05.json']

... it works and does not segfault.

Since I don't even know which part of my environment might be causing this (Python? Pyarrow? libhdfs3?), I'm not sure what to even search for as far as troubleshooting.

Any thoughts or recommendations are greatly appreciated!

Dan
  • 4,197
  • 6
  • 34
  • 52
  • It would be important to know how you installed those packages and what versions you have? Are they from conda-forge, anaconda's default channel, pip wheels? – cel Apr 22 '19 at 09:43

0 Answers0