1

I am trying a python code in which I am using pyarrow and trying to make connection to hadoop server using fs.HadoopFileSystem(host=host_value, port=port_value) but everytime I am getting an error message:

    self.parquet_writer = HDFSWriter(host_value='hdfs://10.110.8.239',port_value=9000)
    File "/app/aerial_server.py", line 54, in __init__
        self.hdfs_client = fs.HadoopFileSystem(host=host_value, port=port_value)
    File "pyarrow/_hdfs.pyx", line 89, in pyarrow._hdfs.HadoopFileSystem.__init__
    File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
    File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status
    OSError: HDFS connection failed

env variables

    PYTHON_VERSION=3.7.13
    HADOOP_OPTS=-Djava.library.path=/app/hadoop-3.3.2/lib/nativ
    JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
    HADOOP_INSTALL=/app/hadoop-3.3.2
    ARROW_LIBHDFS_DIR=/app/hadoop-3.3.2/lib/nativeHADOOP_MAPRED_HOME=/app/hadoop-3.3.2
    HADOOP_COMMON_HOME=/app/hadoop-3.3.2
    HADOOP_HOME=/app/hadoop-3.3.2
    HADOOP_HDFS_HOME=/app/hadoop-3.3.2PYTHON_PIP_VERSION=22.0.4
    CLASSPATH=/app/hadoop-3.3.2/bin/hdfs classpath --glob
    HADOOP_COMMON_LIB_NATIVE_DIR=/app/hadoop-3.3.2/lib/native
    PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/app/hadoop-3.3.2/sbin:/app/hadoop-3.3.2/bin
    _=/usr/bin/env
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Aman Jain
  • 2,975
  • 1
  • 20
  • 35
  • 1
    I suggest you refrain from using IP addresses with hdfs. It likes server names and often acts weird when you provide an IP address instead. – Matt Andruff Apr 21 '22 at 12:21
  • @MattAndruff Only in an HA environment it'll act weird. I don't think pyarrow can use nameservices and DNS names vs ip addresses doesn't matter from a client perspective – OneCricketeer Apr 21 '22 at 13:39
  • @OneCricketeer - It's a best practice to use DNS (IP's can change), Also if you plan on using Kerberos it's required. For this issue, it may not be the solution but would be something to remove as a possible issue. – Matt Andruff Apr 21 '22 at 13:52

0 Answers0