0

I am trying to connect to hive using python 2(miniconda2 installation ). Below is the code I am trying-

connection = hive.connect(host='psvlxihpnn1', port= '10000', authMechanism='KERBEROS', user='***',password='****', configuration={'krb_host': 'psvlxihpnn1', 'krb_service': 'ITEDM'} )

kerberos host is installed on the same host machine and has service name 'ITEDM' Strangely, I am getting below error-

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/export/home/itedm/miniconda2/lib/python2.7/site-packages/pyhs2-0.6.0-py2.7.egg/pyhs2/__init__.py", line 7, in connect
  File "/export/home/itedm/miniconda2/lib/python2.7/site-packages/pyhs2-0.6.0-py2.7.egg/pyhs2/connections.py", line 46, in __init__
  File "/export/home/itedm/miniconda2/lib/python2.7/site-packages/pyhs2-0.6.0-py2.7.egg/pyhs2/cloudera/thrift_sasl.py", line 66, in open
thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Server krbtgt/INFORMATICA.COM@ITEDM not found in Kerberos database)

I am not passing 'krbtgt' as user, not sure why this error is coming. Thanks for any help.

  • The syntax is `service/host@REALM` and "krbtgt" means "not actually an *authorisation* request to a *service* but an *authentication* request to get a KeRBeros Ticket-Granting-Ticket (*krb tgt*)" – Samson Scharfrichter Jun 23 '16 at 08:51
  • And in the case of a TGT, the "service principal" should read as `krbtgt/REALM@REALM` (just create a ticket in the default cache with `kinit someuser@SOME.REALM` and check the result with `klist`) so you clearly have a problem. – Samson Scharfrichter Jun 23 '16 at 08:57
  • On the other hand, a TGT for your user is required, but the Hive JDBC URL should explicitly mention the Kerberos principal of the **Hive service** i.e. typically `hive/hs2.host.fqdn@REALM` so you should read carefully the documentation about this ugly Python library (if any). – Samson Scharfrichter Jun 23 '16 at 09:01
  • I am going thru this link for Pyhs2 library...https://github.com/BradRuderman/pyhs2/blob/master/pyhs2/connections.py... It seems I have a problem with the kerberos principal which I am checking with the admin.. is their any better python library that could connect to kerberos enabled hive in python? – Vivek Singh Jun 24 '16 at 10:06

1 Answers1

0

This connection string will work as long as the user running the script has a valid kerberos ticket:

import pyhs2

with pyhs2.connect(host='beeline_host',
                    port=10000,
                    authMechanism="KERBEROS") as conn:

with conn.cursor() as cur:
        print cur.getDatabases()

Username, password and any other configuration parameters are not passed through the KDC.

pele88
  • 802
  • 2
  • 8
  • 16