How to use pyhive sqlalchemy to connect to Databricks clusters?

Question

I'm using following code to connect to Databricks clusters. But got thrift.transport.TTransport.TTransportException: TSocket read 0 bytes. I think there is some error of the my_url in create_engine.

from sqlalchemy.engine import create_engine

engine = create_engine(my_url)

res = engine.execute('SELECT * FROM 
alex_db.sams_us_clubs_sams_member_index LIMIT 5')

for row in res:
    print(res)

Because this is company's cluster, I cannot show the details of my_url. Does anyone have experience in using pyhive sqlalchemy to connect to databricks cluster?

score 1 · Answer 1 · answered Sep 19 '19 at 20:15

How are you wanting to access the hive? Are you running it on a spark/Databricks server?

If you are wanting to run it programatically, look at the following site about setting up the databricks via jdbc/odbc packages (https://docs.databricks.com/user-guide/bi/jdbc-odbc-bi.html?)

If you are running the application on a spark server, you can enable the hive querying like this:

// Create a SparkSession. No need to create SparkContext
// You automatically get it as part of the SparkSession
spark = SparkSession
       .builder()
       .appName("SparkSessionZipsExample")
       .config("spark.sql.warehouse.dir", warehouseLocation)
       .enableHiveSupport()
       .getOrCreate()

source: https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html

Ross Tierney · Answer 2 · 2020-05-15T10:43:42.740

0

You can create an SQL Alchemy engine for pyhive/databricks by using the function below. token: access token can be generated on the databricks console region: e.g. https://${location_of_server}.azuredatabricks.net/ database: name of the database you want to connect to http_path: sql/protocolv1/o/${organization_id}/${cluster_id}
You will need to add all required dependencies to your project.

dbfs_engine = create_engine(
        "databricks+pyhive://token:" + token + "@" + region + ".azuredatabricks.net:443/" + database, 
        connect_args={"http_path": http_path}
        )

edited May 15 '20 at 10:43

answered May 14 '20 at 19:43

Ross Tierney

1
1

1

Add some explanation for your code would be preferred – keikai May 15 '20 at 00:58
I think @Ross Tiemey is referring here to https://github.com/crflynn/databricks-dbapi library – botchniaque Apr 18 '21 at 20:33

How to use pyhive sqlalchemy to connect to Databricks clusters?

2 Answers2