I am currently launching PySpark using DataProcPySparkOperator
from Airflow with a script in Cloud Storage
run_pyspark_job = dataproc_operator.DataProcPySparkOperator(
task_id='run-dataproc-pyspark',
main='gs://my-repo/my-script.py',
project_id=PROJECT_ID,
cluster_name=CLUSTER_NAME,
region='europe-west4'
)
Is there anyway to pass a script from Cloud Source Repositories ? For a given repository one can get the absolute link to the script but it does not seems to be accepted by the DAG.
https://source.cloud.google.com/my-organisation/my-repo/+/master:my-script.py
Is there any way to achieve it?