1

I have a spark application and want to deploy this on a Kubernetes cluster. Following the below documentation I have managed to create an empty Kubernetes cluster, generated docker image using the Dockerfile provided under kubernetes/dockerfiles/spark/Dockerfile and deployed this on the cluster using spark-submit in a Dev environment. https://spark.apache.org/docs/latest/running-on-kubernetes.html

However, in a 'proper' environment we have a managed Kubernetes cluster (bespoke unlike EKS etc.) and will have to provide pod configuration files to get deployed.

I believe you can supply Pod template file as an argument to the spark-submit command. https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template

How can I do this without spark-submit? And are there any example yaml files?

PS: we have limited access to this cluster, e.g. we can install Helm charts but not operator or controller.

adesai
  • 370
  • 3
  • 22
  • This blog post might help/serve as a basis: https://testdriven.io/blog/deploying-spark-on-kubernetes/ You could also compare the settings with those done by the spark-submit tool. – Sören Henning Jul 11 '23 at 13:22

1 Answers1

1

You could try to use k8s Spark CRD https://github.com/GoogleCloudPlatform/spark-on-k8s-operator and provide a pod configuration through it.

Doroshenko
  • 111
  • 1
  • 3
  • Thanks, but we don't have permission to install operator on this cluster. I will update the question. – adesai Oct 27 '22 at 16:58