Below is my dataproc job submit command. I pass the project artifacts as a zip file to the "--files" flag
gcloud dataproc jobs submit pyspark --cluster=test_cluster --region us-central1 gs://test-gcp-project-123/main.py --files=gs://test-gcp-project-123/app_code_v2.zip
Following are the contents of "app_code_v2.zip".
I'm able to add "app_code_v2.zip" to the path using below code snippet, and access the python modules, but how do I access the "yml" files present in the zip package? those yml files contains the configs. Should I explicitly unzip the folder and copy to the working directory of the master node? Is there a better way to handle this?
if os.path.exists('app_code_v2.zip'):
sys.path.insert(0, 'app_code_v2.zip')