hdfs.connect() vs HdfsClient in PyArrow

Question

I apologize if this is a noob question, but I couldn't find any relevant reference -

what is the difference between these two?

If I'd like to read parquet files from hdfs using pyarrow, which one would I use?

score 13 · Accepted Answer · answered Nov 20 '17 at 21:42

13

The HdfsClient API was deprecated, you want to use pyarrow.hdfs.connect now to connect: http://arrow.apache.org/docs/python/filesystems.html#hadoop-file-system-hdfs

answered Nov 20 '17 at 21:42

Wes McKinney

pyarrow.hdfs.connect seems also like a messy solution [same issue as here](https://issues.apache.org/jira/browse/ARROW-2113). Since you are everywhere on this subject, you look like the only one able to understand what's going on – zar3bski May 24 '18 at 11:47

1 Answers1