I want to run a simple spark script which has some sparksql query basicaly Hiveql. the corresponding tables are saved in spark-warehouse folder.
from pyspark.sql import SparkSession
from pyspark.sql import Row
spark=SparkSession.builder.config("spark.sql.warehouse.dir", "file:///C:/tmp").appName("TestApp").enableHiveSupport().getOrCreate()
sqlstring="SELECT lflow1.LeaseType as LeaseType, lflow1.Status as Status, lflow1.Property as property, lflow1.City as City, lesflow2.DealType as DealType, lesflow2.Area as Area, lflow1.Did as DID, lesflow2.MID as MID from lflow1, lesflow2 WHERE lflow1.Did = lesflow2.MID"
def queryBuilder(sqlval):
df=spark.sql(sqlval)
df.show()
result=queryBuilder(sqlstring)
print(result.collect())
print("Type of",type(result))
after performing spark submit operation i am facing bellow error
py4j.protocol.Py4JJavaError: An error occurred while calling o27.sql. : org.apache.spark.sql.AnalysisException: Table or view not found: lflow1; line 1 pos 211
I could not figure out why it is happening. I have seen some posts from stackoverflow and they suggested that i have to enable my hive support which i have already done in my script by writing enablehivesupport(). but still i am getting this error. i am running pyspark 2.2 in windows 10. Kindly help me to figure it out
I have created and saved lflow1 and lesflow2 as permanent table in pyspark shell from dataframe. Here it is mycode
df = spark.read.json("C:/Users/codemen/Desktop/test for sparkreport engine/LeaseFlow1.json")
df2 = spark.read.json("C:/Users/codemen/Desktop/test for sparkreport engine/LeaseFlow2.json")
df.write.saveAsTable("lflow1")
df2.write.saveAsTable("lesflow2")
in Pyspark shell i have performed query
spark.sql("SELECT lflow1.LeaseType as LeaseType, lflow1.Status as Status, lflow1.Property as property, lflow1.City as City, lesflow2.DealType as DealType, lesflow2.Area as Area, lflow1.Did as DID, lesflow2.MID as MID from lflow1, lesflow2 WHERE lflow1.Did = lesflow2.MID").show()
and pyspark console is showing this