I want to connect to the hive service in an MIT kerberos authenticated cloudera hadoop server. I am using a Python script which is hosted on a windows server with no kerberos installed. I am using a conda environment with Python 3.9.7 and Pyhive 0.6.5. As the windows server does not have a kerberos client, I have copied the krb5.conf and keytab files from the cloudera server to my windows server, renamed krb5.conf to krb5.ini, and added their paths to environment variable
from pyhive import hive
import os
os.environ['KRB5_CONFIG'] = 'PATH\TO\krb5.ini'
os.environ['KRB5_CLIENT_KTNAME'] = 'PATH\TO\hive.service.keytab'
conn = hive.Connection(host="some-ip-address", port=4202, auth='KERBEROS', kerberos_service_name='hive')
It failed to connect. Below is the error message
(myenv) C:\Users\myname\Desktop>python hivetest.py
Traceback (most recent call last):
File "C:\Users\myname\Desktop\hivetest.py", line 34, in <module>
hiveconn=hive.Connection(host="some-ip-address",port=4202, auth='KERBEROS', kerberos_service_name='hive')
File "C:\Users\myname\AppData\Local\conda\conda\envs\myenv\lib\site-packages\pyhive\hive.py", line 243, in __init__
self._transport.open()
File "C:\Users\myname\AppData\Local\conda\conda\envs\myenv\lib\site-packages\thrift_sasl\__init__.py", line 84, in open
raise TTransportException(type=TTransportException.NOT_OPEN,
thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'
When the hadoop server was not kerberos authenticated, I was able to connect to the hive service with this line
conn = hive.Connection(host="ip-address", port=4202, username="some-user", auth="NONE")
I removed the lines to add the paths to environment variables just to check whether it would result in a new error message, but the error was the same as shared above.
Pyhive's connection class uses a lot of parameters for initialization, one of them is a configuration parameter. I tried the configs like these, but none of them worked and failed with the same error message.
config1={
'hive.metastore.client.principal':'MYNAME@HADOOP.COM',
'hive.metastore.sasl.enabled': 'true',
'hive.metastore.client.keytab': 'PATH\\TO\\keytab',
}
hiveconn=hive.Connection(host="some-ip",port=4202, auth='KERBEROS', kerberos_service_name='hive', configuration=config1)
config2={
'hive.server2.authentication.kerberos.principal':'MYNAME@HADOOP.COM',
'hive.server2.authentication.kerberos.keytab': 'PATH\\TO\\keytab',
}
hiveconn=hive.Connection(host="some-ip",port=4202, auth='KERBEROS', kerberos_service_name='hive', configuration=config2)
Am I doing something wrong with the prerequisites to make this connection irrespective of Python library? Is it mandatory to install a kerberos client on a server before making connection to another kerberos authenticated hadoop server?