0

I want to run a simple spark script which has some sparksql query basicaly Hiveql. the corresponding tables are saved in spark-warehouse folder.

from pyspark.sql import SparkSession

from pyspark.sql import Row


spark=SparkSession.builder.config("spark.sql.warehouse.dir", "file:///C:/tmp").appName("TestApp").enableHiveSupport().getOrCreate()
sqlstring="SELECT lflow1.LeaseType as LeaseType, lflow1.Status as Status, lflow1.Property as property, lflow1.City as City, lesflow2.DealType as DealType, lesflow2.Area as Area, lflow1.Did as DID, lesflow2.MID as MID from lflow1, lesflow2  WHERE lflow1.Did = lesflow2.MID"

def queryBuilder(sqlval):
    df=spark.sql(sqlval)
    df.show()

result=queryBuilder(sqlstring)
print(result.collect())
print("Type of",type(result))

after performing spark submit operation i am facing bellow error

py4j.protocol.Py4JJavaError: An error occurred while calling o27.sql. : org.apache.spark.sql.AnalysisException: Table or view not found: lflow1; line 1 pos 211

I could not figure out why it is happening. I have seen some posts from stackoverflow and they suggested that i have to enable my hive support which i have already done in my script by writing enablehivesupport(). but still i am getting this error. i am running pyspark 2.2 in windows 10. Kindly help me to figure it out

I have created and saved lflow1 and lesflow2 as permanent table in pyspark shell from dataframe. Here it is mycode

df = spark.read.json("C:/Users/codemen/Desktop/test for sparkreport engine/LeaseFlow1.json")

df2 = spark.read.json("C:/Users/codemen/Desktop/test for sparkreport engine/LeaseFlow2.json")

df.write.saveAsTable("lflow1")
df2.write.saveAsTable("lesflow2")

in Pyspark shell i have performed query

spark.sql("SELECT lflow1.LeaseType as LeaseType, lflow1.Status as Status, lflow1.Property as property, lflow1.City as City, lesflow2.DealType as DealType, lesflow2.Area as Area, lflow1.Did as DID, lesflow2.MID as MID from lflow1, lesflow2  WHERE lflow1.Did = lesflow2.MID").show()

and pyspark console is showing this

enter image description here

Kalyan
  • 1,880
  • 11
  • 35
  • 62
  • Could you show the part of code which creates table "lflow1"? – Yehor Krivokon Aug 15 '17 at 08:44
  • sure i have created and saved lflow1 and lesflow2 as permanent table in pyspark shell .kindly look at my updated question – Kalyan Aug 15 '17 at 08:51
  • Ok Can you see this tables by executing query manually? (For example you can go to hive and type "show tables;" or do the same with sparksql – Yehor Krivokon Aug 15 '17 at 08:53
  • i have added the screenshots of pyspark manual queries – Kalyan Aug 15 '17 at 09:10
  • Why can't you use the same that you use in the shell: val result = spark.sql("SELECT lflow1.LeaseType as LeaseType, lflow1.Status as Status, lflow1.Property as property, lflow1.City as City, lesflow2.DealType as DealType, lesflow2.Area as Area, lflow1.Did as DID, lesflow2.MID as MID from lflow1, lesflow2 WHERE lflow1.Did = lesflow2.MID")? – Yehor Krivokon Aug 15 '17 at 10:28

0 Answers0