I need to read parquet files stored on HDFS (I have a Kerberos-protected Hadoop cluster) in my R program. I came across a couple of packages, but none of them completely satisfy what I need
- rhadoop: It looks like an old project with no further development. rhdfs package under these libraries does not support parquet files or Kerberos.
- arrow: It seems like it can read parquet files, but there is no connectivity to HDFS
Is there any other library which let me read parquet files from HDFS in R?
I'm aware of sparklyr, but I believe I need to install spark on the machine which runs the spark driver? Is that correct? My R client is a different machine.