As per the official spark documentation, we can't run pyspark application in cluster mode inside a standalone cluster.
Currently, the standalone mode does not support cluster mode for Python applications.
Then how can we submit a pyspark job to Spark standalone cluster using Airflow ? I have added spark binaries to the airflow workers to use SparkSubmitOperator, but if I can only use the client mode then the airflow workers will also have to be a spark client ( i.e I will need to setup the spark-defaults and spark-env, logging configs etc.).
I want airflow just to submit the application and not be burdened with spark configs. Is there any other way to accomplish this ?
Any thoughts regarding this would be appreciated. Thanks in advance !!