Questions tagged [pyhive]

107 questions
1
vote
0 answers

Pyhive+Pandas: TTransportException: TSocket read 0 bytes when running query for larger number of rows(>1000)

My code in its simplest form looks as below. I'm trying to connect to hive from a jupyter notebook. I code works fine when I query for smaller number of rows say 'select * from table limit 200' but throws this error when I do something like 'select…
user3437212
  • 637
  • 1
  • 10
  • 20
1
vote
0 answers

SQLAlchemy PyHive limit query result size in MB

I am using PyHive with SQLAlchemy DB-API (asynchronous). My Hive table has millions of records, if I execute: SELECT * FROM table It loads millions of records in the memory. Is there a way to limit the size of query result to certain size, let's…
Mithun Mistry
  • 138
  • 2
  • 7
1
vote
1 answer

How to make Dataproc detect Python-Hive connection as a Yarn Job?

I launch a Dataproc cluster and serve Hive on it. Remotely from any machine I use Pyhive or PyODBC to connect to Hive and do things. It's not just one query. It can be a long session with intermittent queries. (The query itself has issues; will ask…
zpz
  • 354
  • 1
  • 3
  • 16
1
vote
0 answers

PyHive connection from Flask SQLAlchemy

I cannot connect to Hive database with the usage of Flask and SQLAlchemy. Below see my configuration in specific files: .flaskenv file DATABASE_URL="hive://:10000/" config.py file class ProductionConfig(Config): …
bambuste
  • 147
  • 2
  • 9
1
vote
0 answers

configure pyhive from beeline

I'm on a machine where beeline exists and works with the command beeline -u "jdbc:hive2://bdm05:10000/test;principal=hive/_HOST@BIGDATA-INT" -n myuser I would need to use python and pyhive in order to handle tables with pandas. from pyhive import…
DPColombotto
  • 159
  • 1
  • 3
  • 11
1
vote
0 answers

How to load a CSV file into Hive table?

I'm trying to load a CSV from 1 remote server to a Hive client on a different server using Python: I'm opening the CSV file on remote server: with open("/path/to/csv/file/" +self.file_to_load, "rb") as file: csv_file = file.read() Now i'm…
Arik
  • 55
  • 7
1
vote
0 answers

How do I write a pandas dataframe to a HIVE database which uses Kerberos authentification

I can't find good source code to try writing a pandas dataframe that's sitting on my local machine, to a HIVE database for a hadoop cluster. I can query a table and convert it to a pandas dataframe using pyodbc and an odbc driver but I can't write a…
jst
  • 25
  • 5
1
vote
0 answers

How to solve "Error while compiling statement: FAILED: ParseException" while creating hive connection through pyhive in python?

OperationalError: TExecuteStatementResp(status=TStatus(statusCode=3, infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Error while compiling statement: FAILED: ParseException line 1:86 extraneous input ';' expecting EOF near…
1
vote
1 answer

How to connect to impala using impyla or to hive using pyhive?

I am trying to connect to impala using impyla with this code: from impala.dbapi import connect conn = connect(host='host_name.com', port=21050, user='usr', password='pass', use_ssl=True, auth_mechanism='LDAP') cursor =…
psowa001
  • 725
  • 1
  • 6
  • 18
1
vote
0 answers

Unable to write pandas dataframe to hive table

I am testing read and write operations between hive table and pandas. I am able to read from hive to pandas data frame successfully using the below code. from impala.dbapi import connect import pandas as pd conn =…
Pyd
  • 6,017
  • 18
  • 52
  • 109
1
vote
0 answers

Can't Connect to Hive using PyHive

I'm having a problem connecting to Hive using Pyhive. I'm using Virtualenv in a Windows machine (Win 10), I installed all the Pyhive's requirements (sasl, thrift, thrift-sasl and MS Visual C++ 9.0) but I got the same error... Wrong number of…
1
vote
1 answer

How to run presto queries in python using pyhive?

I am trying to run presto query in python using pyhive library but max retries error is coming. I am running it in jupyter notebook locally(laptop). I think its not able to connect to presto node. I am using Azure hdinsight cluster and installed…
1
vote
1 answer

thrift.transport.TTransport.TTransportException: failed to resolve sockaddr for host Pyhive and Python

from pyhive import hive conn = hive.Connection(host="host", username="hive",auth="NOSASL",port=10000) cur = conn.cursor() I wrote this code.I received this error : thrift.transport.TTransport.TTransportException: failed to resolve sockaddr for…
1
vote
0 answers

pyhive connection issue after running a query that takes too long

I am using pandas.read_sql function with hive connection to extract a really large data. I have a script like this: df = pd.read_sql(query_big, hive_connection) df2 = pd.read_sql(query_simple, hive_connection) The big query take a long time, and…
fnosdy
  • 123
  • 8
1
vote
1 answer

Stop logging entire pyhive query to log file

I have a code pipeline where I'm using Pyhive to insert data into DB. from pyhive import hive def save_postprocess_data(postprocess_data): conn = hive.Connection(host="hostname", port=10000, username="username") curr = conn.cursor() …
Lavanyadav009
  • 130
  • 1
  • 1
  • 9