0

I've a spark-submit command that calls my python script. The code runs more than 36 hours, however because of the QDS timeout limit of 36 hours my command gets killed after 36 hours.

Can someone help me how to change this parameter value to set to 50 hours? This is how I'm calling my script in QDS

/usr/lib/spark/bin/spark-submit s3:///abc.py

Trupti
  • 1

1 Answers1

0

We cannot configure the time hour limit to more than 36 hour limit. But we can remove this limit for spark commands.In order to run the Spark application from Analyze/Notebooks, you need to do the following before cluster start:

Edit Cluster configuration and update following configuration in Hadoop Configuration Over-rides

yarn.resourcemanager.app.timeout.minutes=-1

Edit Cluster configuration and update following configuration in Spark Configuration Over-rides

spark.qubole.idle.timeout=-1 

Please let me know if this helps. Also, if you are not running a streaming application, and the data being processed/accessed by your spark app is not humongous, then you may also want to reduce runtime of your app through some performance tuning as well( thereby potentially can reduce the runtime of your app less than 36 hrs) which would not require removing this 36 hour limit in that case.

Anushan
  • 41
  • 4
  • Hi, I tried both settings and now my job is killed after 32 hours. I tired o run my job 2 times and both the times it got killed after 32 hours with this error in the airflow, as I'm invoking my command using airflow Diagnostics: Kill job job_1592584595666_0001 received from qds.airflow.service@ni.com (auth:SIMPLE) at 10.192.19.5 Job received Kill while in RUNNING state. – Trupti Jun 21 '20 at 04:03
  • @Trupti would you mind creating a ticket with details regarding this with Qubole support. We will take a further look inorder to debug this further. Also, one more question - Did you setup the settings suggested above in Airflow cluster or the cluster on which the application are being scheduled? – Anushan Jun 21 '20 at 17:32
  • Also, one more thing, we would need to enable a feature flag from our backend to enable it for your account/cluster. Hence, asking to create a support ticket with Qubole. – Anushan Jun 26 '20 at 15:27
  • I've opened a Qubole support ticket. To answer your question "Did you setup the settings suggested above in Airflow cluster or the cluster on which the application are being scheduled?" I've setup the settings suggested above in the cluster I'm running my code – Trupti Jun 30 '20 at 09:50
  • Got it, thanks @Trupti, coz it requires a feature flag to be enabled from backend along with the recommendations you can do at your end above. I hope this helps. – Anushan Jul 02 '20 at 15:22