0

I want to connect to the hive service in an MIT kerberos authenticated cloudera hadoop server. I am using a Python script which is hosted on a windows server with no kerberos installed. I am using a conda environment with Python 3.9.7 and Pyhive 0.6.5. As the windows server does not have a kerberos client, I have copied the krb5.conf and keytab files from the cloudera server to my windows server, renamed krb5.conf to krb5.ini, and added their paths to environment variable

from pyhive import hive
import os
os.environ['KRB5_CONFIG'] = 'PATH\TO\krb5.ini'
os.environ['KRB5_CLIENT_KTNAME'] = 'PATH\TO\hive.service.keytab'

conn = hive.Connection(host="some-ip-address", port=4202, auth='KERBEROS', kerberos_service_name='hive')

It failed to connect. Below is the error message

(myenv) C:\Users\myname\Desktop>python hivetest.py
Traceback (most recent call last):
  File "C:\Users\myname\Desktop\hivetest.py", line 34, in <module>
    hiveconn=hive.Connection(host="some-ip-address",port=4202, auth='KERBEROS', kerberos_service_name='hive')
  File "C:\Users\myname\AppData\Local\conda\conda\envs\myenv\lib\site-packages\pyhive\hive.py", line 243, in __init__
    self._transport.open()
  File "C:\Users\myname\AppData\Local\conda\conda\envs\myenv\lib\site-packages\thrift_sasl\__init__.py", line 84, in open
    raise TTransportException(type=TTransportException.NOT_OPEN,
thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'

When the hadoop server was not kerberos authenticated, I was able to connect to the hive service with this line

conn = hive.Connection(host="ip-address", port=4202, username="some-user", auth="NONE")

I removed the lines to add the paths to environment variables just to check whether it would result in a new error message, but the error was the same as shared above.

Pyhive's connection class uses a lot of parameters for initialization, one of them is a configuration parameter. I tried the configs like these, but none of them worked and failed with the same error message.

config1={
    'hive.metastore.client.principal':'MYNAME@HADOOP.COM',
    'hive.metastore.sasl.enabled': 'true',
    'hive.metastore.client.keytab': 'PATH\\TO\\keytab',
}
hiveconn=hive.Connection(host="some-ip",port=4202, auth='KERBEROS', kerberos_service_name='hive', configuration=config1)

config2={
    'hive.server2.authentication.kerberos.principal':'MYNAME@HADOOP.COM',
    'hive.server2.authentication.kerberos.keytab': 'PATH\\TO\\keytab',
}
hiveconn=hive.Connection(host="some-ip",port=4202, auth='KERBEROS', kerberos_service_name='hive', configuration=config2)

Am I doing something wrong with the prerequisites to make this connection irrespective of Python library? Is it mandatory to install a kerberos client on a server before making connection to another kerberos authenticated hadoop server?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
shad0w
  • 3
  • 8

1 Answers1

1

I have not used pyhive library. But had some experiences of Kerberos. You mentioned that the windows machine hosting python script don't have Kerberos client. According to my understanding you need to have one. In general most of the libraries which uses Kerberos don't have capabilities to acquire Kerberos ticket from KDC. It only have capability to use the acquired session tickets. In case this libraries acquire also they will use Kerberos client APIs to use platforms Kerberos client to acquire ticket.

In this case, You have to install Kerberos client in Windows Machine. Modify your krb.ini file to contact remote KDC. Make sure that you can acquire Kerberos ticket using Kerberos client not python script. Once you are able to acquire the ticket using Kerberos client then you can go with python script. It should work.

  • A few questions: 1. Is the ticket acquired with a programming script not valid, like `import os; os.system("kinit")` in Python? As I am trying to work on a design where the end user just has to click on a button and rest of the things will be handled by backend, using terminal to generate a ticket will require manual intervention. 2. Can't I just copy the krb5.conf file from the linux server to my windows server and rename it to krb5.ini? – shad0w May 03 '23 at 04:26
  • 1. As per my understanding `os.system("kinit")` will create subshell and will execute kinit command inside it. No matter how kinit is executed by starting new terminal session or from python script using os.system(), subprocess, commands module. If kinit successfully, it will store that session ticket to one cache file. Once you have valid ticket stored in cache file. Your application will read that cache file only. It has nothing to do with how you obtained ticket(My Assumption). By default ticket will be stored in `C:\Users\\krb5cc_` location in windows. ...... – Uddhav Savani May 03 '23 at 06:52
  • ... In case you have changed default ccahe file location. You have to export one environment variable named `KRB5CCNAME` [seethis](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/env_variables.html). In python script you can do so using os.putenv() method. I have tried same thing in linux. Where i am obtaining the ticket using cron job and storing it in once ccache file and exporting env in python script. In linux it works, in windows also it should work. 2. Although I have not tried it but it should work as the methodology is same to define configuration file in windows as well as in linux. – Uddhav Savani May 03 '23 at 06:52