6

I'm on a W8 machine, where I use Python (Anaconda distribution) to connect to Impala in our Hadoop cluster using the Impyla package. Our hadoop cluster is secured via Kerberos. I have followed the API REFERENCE how to configure the connection.

    from impala.dbapi import connect
    conn = connect( host='localhost', port=21050, auth_mechanism='GSSAPI',
               kerberos_service_name='impala')

We are using Kerberos GSSAPI with SASL

auth_mechanism='GSSAPI'

I have managed to install python-sasl library for WIN8 but still I encounter this error.

Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found (code THRIFTTRANSPORT): TTransportException('Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found',)

I wonder if I am still missing some dependencies.

richban
  • 101
  • 2
  • 2
  • 5
  • 2
    If that `kerberos_service_name` actually means a Kerberos service principal, it should be something like "impala/_HOST@INSERT.YOUR.REALM.HERE" with the same Realm that is referenced in your **krb5.conf** file, and "_HOST" working as a joker for the actual host that you are connecting to. – Samson Scharfrichter Jan 28 '16 at 17:43
  • And I strongly doubt that Impala is running on your PC, hence "localhost" is a joke. – Samson Scharfrichter Jan 28 '16 at 17:44
  • If you run into a similar error from puresasl, you should [install the kerberos Python package](https://github.com/thobbs/pure-sasl/issues/20). – Brecht Machiels Jun 07 '17 at 15:57

7 Answers7

3

Install the kerberos Python package, it will fix your issue.

Lutz Prechelt
  • 36,608
  • 11
  • 63
  • 88
3

I ran into the same issue but i fixed it by installing the right version of required libraries.

Install below python libraries using pip:

six==1.12.0
bit_array==0.1.0
thrift==0.9.3
thrift_sasl==0.2.1
sasl==0.2.1
impyla==0.13.8

Below code is working fine with the python version 2.7 and 3.4.

import ssl
from impala.dbapi import connect
import os
os.system("kinit")
conn = connect(host='hostname.io', port=21050, use_ssl=True, database='default', user='urusername', kerberos_service_name='impala', auth_mechanism = 'GSSAPI')
cur = conn.cursor()
cur.execute('SHOW DATABASES;')
result=cur.fetchall()
for data in result:
    print (data) 
  • thanks, after lot of troubleshooting finally this resolved my issue. – Amit Sep 20 '19 at 20:26
  • @SumitKumar How did you find the right versions of the libraries? We have the same issue, just with Python 3.9, but can't seem to find the correct versions to make it work. Thanks. – Pelle Ravn Nov 08 '21 at 07:52
2

For me, installing this package fixed it: libsasl2-modules-gssapi-mit

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
ephes
  • 1,451
  • 1
  • 13
  • 19
2

Try this to get tables for kerberized cluster. In my case CDH-5.14.2-1.

Make sure you have a valid ticket before running this code.

with python 2.7 having below packages.

thrift-0.9.3
thriftpy-0.3.8
thrift_sasl-0.3.0
impyla==0.14.2.2

Working Code

from impala.dbapi import connect
from impala.util import as_pandas

# 21000 is impala daemon port.
conn = connect(host='yourHost', port=21050, auth_mechanism='GSSAPI') 

cursor = conn.cursor()
cursor.execute("SHOW TABLES")
# After running .execute(), Impala will store the result sets on the server
# until it is fetched. Use the method .fetchall() to pull the entire result
# set over the network (you should only do it if you know dataset is small)
tables = cursor.fetchall()

print("Displaying list of tables")
# the result is a list of tuples
for t in tables:
    # we know that each row in SHOW TABLES result
    # should only contains one table name
    print(t[0])
    # exit() enable for only one table

print("eol >>>")
s_mj
  • 530
  • 11
  • 28
1

For me, the following connection parameters worked. I did not have to install any additional packages in python.

connect(host="your_host", port=21050, auth_mechanism='GSSAPI', timeout=100000, use_ssl=False, ca_cert=None, ldap_user=None, ldap_password=None, kerberos_service_name='impala')
lalith kkvn
  • 310
  • 1
  • 3
  • 11
1

To connection Impala using python you can follow below steps,

  1. Install Coludera ODBC Driver for Impala.
  2. Create DSN using 64-bit ODBC driver, put your server details, below is sample screen shot for same enter image description here

    1. Use below code snippet for connectivity

      import pyodbc

      with pyodbc.connect("DSN=impala_con", autocommit=True) as conn: ... df = pd.read_sql("", conn)

Ajay Kharade
  • 1,469
  • 1
  • 17
  • 31
0

python cannot connect hiveserver2

make sure you install cyrus-sasl-devel and cyrus-sasl-gssapi

kirk wu
  • 1
  • 1