Im trying to deploy spark (pyspark) in kubernetes using spark-submit, but I'm getting the following error :
Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property. at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:330) at org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:300) at
Since i'm packing my dependencies trhough a virtual environment, I don't have the need of specify a remote cluster to retrieve them, so I'm no setting the parameter spark.kubernetes.file.upload.path
I tried to include that parameter anyway, leaving an empty value, but it does'nt work.
My spark-submit command (which I trigger from a python script) is as follows:
cmd = f""" {SPARK_HOME}/bin/spark-submit
--master {SPARK_MASTER}
--deploy-mode cluster
--name spark-policy-engine
--executor-memory {EXECUTOR_MEMORY} \
--conf spark.executor.instances={N_EXECUTORS}
--conf spark.kubernetes.container.image={SPARK_IMAGE}
--conf spark.kubernetes.file.upload.path=''
--packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1,org.apache.hadoop:hadoop-aws:3.3.1,com.amazonaws:aws-java-sdk-bundle:1.11.901,org.apache.hadoop:hadoop-common:3.3.1
--archives pyspark_venv.tar.gz#environment {spark_files}
--format_id {format_id}
"""
As shown I'm including the parameter with within a --conf tag (as shown in https://spark.apache.org/docs/3.0.0-preview/running-on-kubernetes.html#:~:text=It%20can%20be%20found%20in,use%20with%20the%20Kubernetes%20backend.&text=This%20will%20build%20using%20the%20projects%20provided%20default%20Dockerfiles%20.), but wether is present or not, it just doesn't work