0

I've wrote a function that is using pyhive to read from Hive. Running it locally it works fine. However when trying to use lambda function I got the error: "Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found'"

I've tried to use the guidelines in this link: https://github.com/cloudera/impyla/issues/201

However, I wasn't able to use latest command: yum install cyrus-sasl-lib cyrus-sasl-gssapi cyrus-sasl-md5 since the system I was using to build is ubuntu that doesn't support the yum function. Tried to install those packages (using apt-get): sasl2-bin libsasl2-2 libsasl2-dev libsasl2-modules libsasl2-modules-gssapi-mit

like described in: python cannot connect hiveserver2 But still no luck. Any ideas?

Thanks, Nir.

Nir99
  • 185
  • 3
  • 15

1 Answers1

1

You can follow this github issue. I am able to connect Hive server2 with LDAP authentication using the pyhive library in AWS Lambda with Python 2.7. What I have done to make it work is:

  1. Took one EC2 instance or launch container with AMI used in Lambda.
  2. Run the following commands to install the required dependencies

    yum upgrade
    
    yum install gcc
    
    yum install gcc-g++
    
    sudo yum install cyrus-sasl cyrus-sasl-devel cyrus-sasl-ldap #include cyrus-sals dependency for authentication mechanism you are using to connect to hive
    
    pip install six==1.12.0
    
  3. Bundle up the /usr/lib64/sasl2/ to Lambda and set os.environ['SASL_PATH'] = os.path.join(os.getcwd(), /path/to/sasl2. Verify if .so files are presented on os.environ['SASL_PATH'] path.

  4. My Lambda code looks like:

    from pyhive import hive
    import logging
    import os
    os.environ['SASL_PATH'] = os.path.join(os.getcwd(), 'lib/sasl2')
    log = logging.getLogger()
    log.setLevel(logging.INFO)
    log.info('Path: %s',os.environ['SASL_PATH'])
    def lambda_handler(event, context):
        cursor = hive.connect(host='hiveServer2Ip', port=10000, username='userName', auth='LDAP',password='password').cursor()
        SHOW_TABLE_QUERY = "show tables"
        cursor.execute(SHOW_TABLE_QUERY)
        tables = cursor.fetchall()
        log.info('tables: %s', tables)
        log.info('done')
    
David Buck
  • 3,752
  • 35
  • 31
  • 35