2

We are running Spark Cluster on Kubernetes. When we submited jobs as below, driver pod and executer pods were all up and running. However, the application failed to work as expected, the root cause we suspected is that it failed to find the source path as specified by parameter "py-files". As we witnessed, the driver pod has a warning MountVolume.Setup failed for volume "spark-conf-volume". Would you please advise?

bin/spark-submit \
--master k8s://https://k8s-master-ip:6443  \
--deploy-mode cluster \
--name algo-vm \
--py-files hdfs://{our_ip}:9000/testdata/src.zip \
--conf spark.executor.instances=2 \
--conf spark.driver.port=10000 \
--conf spark.port.maxRetries=1 \
--conf spark.blockManager.port=20000 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.kubernetes.container.image={our_ip}/sutpc/k8s-spark-242-entry/spark-py:1.0 \
--jars hdfs://hdfs-master-ip:9000/jar/spark-sql-kafka-0-10_2.11-2.4.5.jar,hdfs://{our_ip}:9000/jar/kafka-clients-0.11.0.2.jar \
hdfs://{our_ip}:9000/testdata/spark_main.py
Chenmingdong
  • 21
  • 1
  • 3
  • I'm seeing the same warning and suspect this to be the reason for pyFiles not being fetched from GCS --> https://stackoverflow.com/questions/62448894/dependency-issue-with-pyspark-running-on-kubernetes-using-spark-on-k8s-operator – denise Jun 18 '20 at 12:41
  • Thanks @denise, we solve this issue. Just like you said, it is the reason that pyFiles cannot be found. – Chenmingdong Jun 21 '20 at 07:59
  • that's great, how did you solve the issue? – denise Jun 22 '20 at 13:17
  • @denise We solved this issue by specify PYTHON_PATH in docker, please refer to https://issues.apache.org/jira/browse/SPARK-30496 – Chenmingdong Jun 29 '20 at 07:14

0 Answers0