0

I am using pyspark==2.4.3 and i just want to run an hql file

use myDatabaseName;
show tables;

and here is what i tried

from os.path import expanduser, join, abspath

from pyspark.sql import SparkSession
from pyspark.sql import Row

# warehouse_location points to the default location for managed databases and tables
warehouse_location = abspath('spark-warehouse')

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL Hive integration example") \
    .config("spark.sql.warehouse.dir", warehouse_location) \
    .enableHiveSupport() \
    .getOrCreate()

with open('full/path/to/my/hqlfile') as t:
    q=t.read()

print q
'use myDatabaseName;show tables;\n'
spark.sql(q)

but i get

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/some/path/python2.7/site-packages/pyspark/sql/session.py", line 767, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/some/path/python2.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/some/path/python2.7/site-packages/pyspark/sql/utils.py", line 73, in deco
    raise ParseException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.ParseException: u"\nmismatched input ';' expecting <EOF>(line 1, pos 11)\n\n== SQL ==\nuse myDatabaseName;show tables;\n-----------^^^\n"

what am i doing wrong ?

AbtPst
  • 7,778
  • 17
  • 91
  • 172

1 Answers1

2

like the error suggested, ; is not valid syntax in spark.sql,

Second, you can not call two commands in a single spark.sql call.

I will modify the q to be a list of query string without ; in it then for loop.

query_lt = q.split(";")[:-1]
for qs in query_lt:
    spark.sql(qs)
E.ZY.
  • 675
  • 5
  • 12