1

I'm using 'ibis-framework'. I have

hdfs_client = ibis.hdfs_connect(...)
impala_client = ibis.impala.connect(..., hdfs_client=hdfs_client)
db = impala_client.database('abc')
data = pd.DataFrame(...)
db.create_table('tb_name', obj=data, format='parquet', force=True)

This failed because the namenoderpcaddress constructed by requests (called in ibis) has port 8020, whereas the correct one for me is 8022 (which is Cloudera recommended port; maybe for HA purposes).

ConnectionError: HTTPConnectionPool(host='ip-0-0-0-0.ec2.internal', port=50075): Max retries exceeded with url: /webhdfs/v1/tmp/ibis/pandas_7ae170c27ee6426e97e0f84aa9a2a778/0.csv?op=CREATE&user.name=user&namenoderpcaddress=ip-0-0-0-0.ec2.internal:8020&overwrite=false&user.name=user (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feb44be49d0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

(ip addresses in msg above were edited by me)

Any idea how I can tell ibis or requests this port number?

Thanks.

thaavik
  • 3,257
  • 2
  • 18
  • 25
zpz
  • 354
  • 1
  • 3
  • 16

1 Answers1

0

ibis.hdfs_connect takes a port argument that you can use to pass 8022

thaavik
  • 3,257
  • 2
  • 18
  • 25