5

I apologize if this is a noob question, but I couldn't find any relevant reference -

what is the difference between these two?

If I'd like to read parquet files from hdfs using pyarrow, which one would I use?

Jay
  • 2,535
  • 3
  • 32
  • 44

1 Answers1

13

The HdfsClient API was deprecated, you want to use pyarrow.hdfs.connect now to connect: http://arrow.apache.org/docs/python/filesystems.html#hadoop-file-system-hdfs

Wes McKinney
  • 101,437
  • 32
  • 142
  • 108
  • pyarrow.hdfs.connect seems also like a messy solution [same issue as here](https://issues.apache.org/jira/browse/ARROW-2113). Since you are everywhere on this subject, you look like the only one able to understand what's going on – zar3bski May 24 '18 at 11:47