Questions tagged [pyhive]
107 questions
1
vote
0 answers
Pyhive+Pandas: TTransportException: TSocket read 0 bytes when running query for larger number of rows(>1000)
My code in its simplest form looks as below. I'm trying to connect to hive from a jupyter notebook. I code works fine when I query for smaller number of rows say 'select * from table limit 200' but throws this error when I do something like 'select…

user3437212
- 637
- 1
- 10
- 20
1
vote
0 answers
SQLAlchemy PyHive limit query result size in MB
I am using PyHive with SQLAlchemy DB-API (asynchronous).
My Hive table has millions of records, if I execute:
SELECT * FROM table
It loads millions of records in the memory. Is there a way to limit the size of query result to certain size, let's…

Mithun Mistry
- 138
- 2
- 7
1
vote
1 answer
How to make Dataproc detect Python-Hive connection as a Yarn Job?
I launch a Dataproc cluster and serve Hive on it. Remotely from any machine I use Pyhive or PyODBC to connect to Hive and do things. It's not just one query. It can be a long session with intermittent queries. (The query itself has issues; will ask…

zpz
- 354
- 1
- 3
- 16
1
vote
0 answers
PyHive connection from Flask SQLAlchemy
I cannot connect to Hive database with the usage of Flask and SQLAlchemy. Below see my configuration in specific files:
.flaskenv file
DATABASE_URL="hive://:10000/"
config.py file
class ProductionConfig(Config):
…

bambuste
- 147
- 2
- 9
1
vote
0 answers
configure pyhive from beeline
I'm on a machine where beeline exists and works with the command
beeline -u "jdbc:hive2://bdm05:10000/test;principal=hive/_HOST@BIGDATA-INT" -n myuser
I would need to use python and pyhive in order to handle tables with pandas.
from pyhive import…

DPColombotto
- 159
- 1
- 3
- 11
1
vote
0 answers
How to load a CSV file into Hive table?
I'm trying to load a CSV from 1 remote server to a Hive client on a different server using Python:
I'm opening the CSV file on remote server:
with open("/path/to/csv/file/" +self.file_to_load, "rb") as file:
csv_file = file.read()
Now i'm…

Arik
- 55
- 7
1
vote
0 answers
How do I write a pandas dataframe to a HIVE database which uses Kerberos authentification
I can't find good source code to try writing a pandas dataframe that's sitting on my local machine, to a HIVE database for a hadoop cluster.
I can query a table and convert it to a pandas dataframe using pyodbc and an odbc driver but I can't write a…

jst
- 25
- 5
1
vote
0 answers
How to solve "Error while compiling statement: FAILED: ParseException" while creating hive connection through pyhive in python?
OperationalError: TExecuteStatementResp(status=TStatus(statusCode=3,
infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Error
while compiling statement: FAILED: ParseException line 1:86 extraneous
input ';' expecting EOF near…

PRAVIN MASKE
- 31
- 4
1
vote
1 answer
How to connect to impala using impyla or to hive using pyhive?
I am trying to connect to impala using impyla with this code:
from impala.dbapi import connect
conn = connect(host='host_name.com', port=21050, user='usr', password='pass', use_ssl=True, auth_mechanism='LDAP')
cursor =…

psowa001
- 725
- 1
- 6
- 18
1
vote
0 answers
Unable to write pandas dataframe to hive table
I am testing read and write operations between hive table and pandas.
I am able to read from hive to pandas data frame successfully using the below code.
from impala.dbapi import connect
import pandas as pd
conn =…

Pyd
- 6,017
- 18
- 52
- 109
1
vote
0 answers
Can't Connect to Hive using PyHive
I'm having a problem connecting to Hive using Pyhive.
I'm using Virtualenv in a Windows machine (Win 10), I installed all the Pyhive's requirements (sasl, thrift, thrift-sasl and MS Visual C++ 9.0) but I got the same error...
Wrong number of…

Gabriel Sanches
- 11
- 1
1
vote
1 answer
How to run presto queries in python using pyhive?
I am trying to run presto query in python using pyhive library but max retries error is coming. I am running it in jupyter notebook locally(laptop). I think its not able to connect to presto node. I am using Azure hdinsight cluster and installed…

Bhanuday Birla
- 969
- 1
- 10
- 23
1
vote
1 answer
thrift.transport.TTransport.TTransportException: failed to resolve sockaddr for host Pyhive and Python
from pyhive import hive
conn = hive.Connection(host="host", username="hive",auth="NOSASL",port=10000)
cur = conn.cursor()
I wrote this code.I received this error :
thrift.transport.TTransport.TTransportException: failed to resolve sockaddr for…

Oğuz Kırçiçek
- 51
- 1
- 6
1
vote
0 answers
pyhive connection issue after running a query that takes too long
I am using pandas.read_sql function with hive connection to extract a really large data. I have a script like this:
df = pd.read_sql(query_big, hive_connection)
df2 = pd.read_sql(query_simple, hive_connection)
The big query take a long time, and…

fnosdy
- 123
- 8
1
vote
1 answer
Stop logging entire pyhive query to log file
I have a code pipeline where I'm using Pyhive to insert data into DB.
from pyhive import hive
def save_postprocess_data(postprocess_data):
conn = hive.Connection(host="hostname", port=10000, username="username")
curr = conn.cursor()
…

Lavanyadav009
- 130
- 1
- 1
- 9