1

What I'm trying to do is very basic: connect to an Impala db using Python:

from impala.dbapi import connect

conn = connect(host='impala', port=21050, auth_mechanism='PLAIN')

I'm using Impyla package to do so. I got this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/thriftpy/transport/socket.py", line 96, in open
    self.sock.connect(addr)
socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/alaaeddine/PycharmProjects/test/data_test.py", line 3, in <module>
    conn = connect(host='impala', port=21050, auth_mechanism='PLAIN')
  File "/usr/local/lib/python3.6/dist-packages/impala/dbapi.py", line 147, in connect
    auth_mechanism=auth_mechanism)
  File "/usr/local/lib/python3.6/dist-packages/impala/hiveserver2.py", line 758, in connect
    transport.open()
  File "/usr/local/lib/python3.6/dist-packages/thrift_sasl/__init__.py", line 61, in open
    self._trans.open()
  File "/usr/local/lib/python3.6/dist-packages/thriftpy/transport/socket.py", line 104, in open
    message="Could not connect to %s" % str(addr))
thriftpy.transport.TTransportException: TTransportException(type=1, message="Could not connect to ('impala', 21050)")

Tried also the Ibis package but failed with the same thriftpy related error.

In Windows using Dbeaver, I could connect to the database using the official Cloudera JDBC connector. My questions are:

  • Should pass my JDBC connector as parameter in my connect code? I have made some search I could not find something pointing at this direction.
  • Should I try something else than Ibis and Impyla packages? I had experienced a lot of version related issues and dependencies when using them. If yes, what would you recommend as alternatives?

Thanks!

ds_enth
  • 49
  • 3
  • 9

3 Answers3

0

Solved: I used pyhive package instead of Ibis/Impyla. Here's an example:

#import hive from pyhive
from pyhive import hive

#establish the connection to the db
conn = hive.Connection(host='host_IP_addr', port='conn_port', auth='auth_type', database='my_db')

#prepare the cursor for the queries
cursor = conn.cursor()

#execute a query
cursor.execute("SHOW TABLES")

#navigate and display the results 
for table in cursor.fetchall():
    print(table)
ds_enth
  • 49
  • 3
  • 9
0

Your impala domain name must not be resolving. Are you able to do nslookup impala in command prompt? If you're using Docker, you need to have the docker service name in docker-compose as "impala" or have "extra_hosts" option. Or you can always add it to /etc/hosts (Windows/Drivers/etc/hosts) as impala 127.0.0.1

Also try 'NOSASL' instead of PLAIN sometimes that works better with security turned off.

Dexter
  • 6,170
  • 18
  • 74
  • 101
0

This is the simple method, connecting impala through impala shell using python.

    import commands
    import re
    query1 = "select * from table_name limit 10"
    impalad = str('hostname')
    port = str('21000')
    database = str('database_name')
    result_string = 'impala-shell -i "'+ impalad+':'+port +'" -k -B --delimited -q "'+query1+'"' 
    status, output = commands.getstatusoutput(result_string)
    print output
    if status == 0:
            print output
    else:
            print "Error encountered while executing HiveQL queries."
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Uday
  • 1
  • 1