0

I'm trying to connecct to a presto DB installed in a remote server from my mac local machine using pyspark, below is my code. I have downloaded the presto driver and placed it under /user/name//Hadoop/spark-2.3.1-bin-hadoop2.7/jars ( I guess this is where I'm making a mistake, but not sure)

from pyspark.sql import SparkSession, HiveContext
from pyhive import presto, hive


def main():

    spark = SparkSession.builder\
        .appName("tests")\
        .enableHiveSupport()\
        .getOrCreate()
        
   df_presto = spark.read.format("jdbc") \
          .option("driver", "io.prestosql.jdbc.PrestoDriver")\
          .option("url", "jdbc:presto://host.com:443/hive") \
          .option("user", "user_name")\
          .option("password", "password") \
          .option("dbtable", "(select column from table_name limit 10) tmp") \
          .load()

Preso driver : presto-jdbc-340.jar

When I tried to execute the code, I'm getting an error as below

 Traceback (most recent call last):
  File "/Users/user_name/Hadoop/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/Users/user_name/Hadoop/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o38.load.
: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.UnknownHostException: ip-10-120-99-149.ec2.internal;
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
    at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)

Any idea how can I fix this?

user7343922
  • 316
  • 4
  • 17
  • DId you try passing the jar with `spark-submit`. Try `spark-submit --jars path_to_presto_jar pyspark.py` and see if it resolve your issue – User12345 Jun 30 '21 at 04:50
  • Tried that way as well, but getting the same error. – user7343922 Jun 30 '21 at 11:50
  • Did you confirm connection to jdbc:presto://host.com:443/hive without Spark? – ebyhr Jul 02 '21 at 02:08
  • this is same connection I'm using in datagrip to connect to presto, Should I be using a different way or use different connection from pyspark? Can you tell me how can I test it from command line? – user7343922 Jul 02 '21 at 11:05

0 Answers0