I am trying to run a pyspark program by using spark-submit:
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark.sql import SparkSession
SNOWFLAKE_DATA_SOURCE = 'net.snowflake.spark.snowflake'
def get_records(spark:SparkSession):
sfOptions = {
'sfURL' : 'url',
'sfAccount' : 'acntname',
'sfUser' : 'username',
'sfPassword' : 'pwd',
'sfRole' : 'role',
'sfDatabase' : 'dbname',
'sfSchema' : 'schema',
'sfWarehouse' : 'warehousename'
}
rec_df = spark.read.format(SNOWFLAKE_DATA_SOURCE).options(**sfOptions).options('query','select AREA from SCHEMA.TABLENAME limit 1').load()
rec_df.show()
if __name__ == "__main__":
spark = SparkSession.Builder.master('yarn').appName('Check_Con').getOrCreate()
sc = SparkContext("yarn", "Simple App")
spark = SQLContext(sc)
spark_conf = SparkConf().setMaster('local').setAppName('CHECK')
get_records(spark)
Spark-submit:
spark-submit --master yarn --deploy-mode cluster --keytab /home/devuser/devuser.keytab --principal devuser@PRINCIPAL.COM --num-executors 2 --executor-memory 1G --executor-cores 2 --driver-memory 1G --jars /home/spark-snowflake_2.11-2.8.0-spark_2.4.jar,/home/snowflake-jdbc-3.12.9.jar --files /home/devuser/conn_props/core-site_dummy.xml,/home/devuser/conn_props/hdfs-site_dummy.xml check_snow.py
When I submit the code, it ends with ERROR ApplicationMaster: User application exited with status 1
There is no error description or anything as such. The job just ends abruptly.
20/07/17 19:12:42 ERROR ApplicationMaster: User application exited with status 1
20/07/17 19:12:42 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: User application exited with status 1)
20/07/17 19:12:42 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
I tried to pass the core-site.xml & hdfs-site.xml in the spark-submit command as well. But no changes are working.
If I submit the same code on the terminal, on spark-shell, the code runs well. I am working in spark for the last one year faced many exceptions but never this one. I don't understand this phenomenon.
Edit 1: Some of the suggestions were to remove the unwanted part in the code and also remove setting of yarn inside the code since it is being set in spark-submit.
if __name__ == "__main__":
spark = SparkSession.Builder.appName('Check_Con').getOrCreate()
get_records(spark)
Even after that, I still see the same exception.
Could anyone let me know if I had made any mistake and how can I fix this problem ?