Questions tagged [impyla]

Impyla is a Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines.

Impyla is a Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines.

Features:

  • HiveServer2 compliant; works with Impala and Hive, including nested data

  • Fully DB API 2.0 (PEP 249)-compliant Python client (similar to sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

  • Works with Kerberos, LDAP, SSL

  • SQLAlchemy connector

  • Converter to pandas DataFrame, allowing easy integration into the Python data stack (including scikit-learn and matplotlib); but see the Ibis project for a richer experience

References:

Related tags:

52 questions
1
vote
1 answer

Not calculating sum for all columns in pandas dataframe

I'm pulling data from Impala using impyla, and converting them to dataframe using as_pandas. And I'm using Pandas 0.18.0, Python 2.7.9 I'm trying to calculate the sum of all columns in a dataframe and trying to select the columns which are greater…
Manoj Kumar
  • 745
  • 2
  • 8
  • 29
0
votes
0 answers

Use python to try to connect to impala and report an error

I am trying to use python to connect to impala for query and my code is as follows: from impala.dbapi import connect import pandas as pd conn = connect(host="xxx.xxx.xxx.xxx", port=10000, auth_mechanism="PLAIN", user="admin", …
0
votes
0 answers

Problems connecting to impala using sqlalchemy

I have been trying to connect to impala using sqlalchemy and seem to be having lots of problem. This is my code: engine = sqlalchemy.create_engine("impala://",creator= connect(host = "....", port=21050, database="default",…
Harsh
  • 1
0
votes
1 answer

impala.error.HiveServer2Error: Failed after retrying 3 times

I use impyla and ibis to connect hive server, but I got the error. I tried the following code: from impala.dbapi import connect impcur = connect(host="kudu3", port=10000, database="yingda_test", password=None, user='admin',…
jxfruit
  • 1
  • 2
0
votes
1 answer

impyla queries - able to generate cursor but no results and no error message

I'm all alone on my team with basically no technical support and first person to do this so I have nobody to turn to. I'm able to use the connect statement. I think I have this right since I have no errors! here. If I change anything in my connect…
runningbirds
  • 6,235
  • 13
  • 55
  • 94
0
votes
1 answer

impyla : how to setup mem_limit?

I'm using impyla==0.16.2 on python 3.8.3 Tried to execute set mem_limit=1G and after running query it does still give the error of mem_limit. That should be resolved because If I follow the same steps on Dbeaver it works as expected. Not sure why…
Soni007
  • 103
  • 2
  • 13
0
votes
0 answers

compute stats in impala using impyla

I'm trying to compute statistics in impala(hive) using python impyla module. command used: compute stats db.tablename; But im getting below error. cannot recognize input near 'compute' 'stats' How this can be fixed?
Raja
  • 507
  • 1
  • 6
  • 24
0
votes
1 answer

Impyla connection. Cannot start SASL. No mechanism available

I am trying to connect to impala using impyla, each time I am getting this error Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2' I have…
psowa001
  • 725
  • 1
  • 6
  • 18
0
votes
1 answer

Ibis create impala table with pandas dataframe and get [Error 61] Connection refused

After doing impyla sql statement, I convert the results into pandas dataframe format. But now I want to auto create a temporary table on impala using Apache Ibis to create table and load a dataframe into it. The following codes are divided into 3…
Eric.XY
  • 11
  • 4
0
votes
1 answer

Issues Connecting to Impala Kerberos Hadoop - Windows/Python 3.6

I have gone through a wide search but nothing is working for me. Code goes something like this: from impala.dbapi import connect conn = connect(host = 'myhost', port = 21050, auth_mechanism = 'GSSAPI', kerberos_service_name = 'impala') cursor =…
formicaman
  • 1,317
  • 3
  • 16
  • 32
0
votes
1 answer

Is there a way to invalidate metadata and rebuild index from python code in CDSW?

I am using Impyla and Python in the CDSW to query data in HDFS and use it. The problem is sometimes to get all of the data I have to go in and manually click on the "Invalidate all metadata and rebuild index" button in HUE. Is there a way to do…
sectechguy
  • 2,037
  • 4
  • 28
  • 61
0
votes
3 answers

How to connect to Apache Hadoop with Impyla and Kerberos

first of all I also read this question (since it seems to be simillar). My problem is that I also try to connect to our Apache Hadoop system which is now secured by Kerberos. I use the impyla module to achieve this. Before Kerberos was installed on…
Aquen
  • 267
  • 1
  • 2
  • 16
0
votes
1 answer

Impala queries are not executed in async manner

Basically, have a small aiohttp app, which receives list of Impala queries and then sends them Impala. However some of the queries may take long time to complete, so decided to do it in async/parallel way. Got one solution with Threads working, but…
JDRussia
  • 65
  • 1
  • 10
0
votes
2 answers

Impyla return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask when querying HiveServer2

I am using Impyla for querying some results from HIVE, however, I met this problem: From Impyla: impala.error.OperationalError: Error while processing statement: FAILED: Execution Error, return code 1 from…
Truong Nguyen
  • 75
  • 2
  • 11
0
votes
0 answers

Can not connect Kerberos Enabled HiveServer2 (No credentials cache file found)

Trying to connect from host machine to CDH VBox Kerberos Enabled Impala. impala-shell -k command works perfectly but I can not connect via impyla: Traceback (most recent call last): File "yarasa.py", line 2, in conn =…
ufukomer
  • 1,021
  • 1
  • 14
  • 16