I have followed this tutorial: https://aws.amazon.com/fr/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/ in order to be able to run pyspark code on EMR via apache-livy. I have only made some little change so that EMR configuration script run as a sagemaker lifecycle configuration script.
When testing the connection with curl <EMR Master Private IP>:8998/sessions
the result seems completely fine: {"from":0,"total":0,"sessions":[]}
. But, when I try to run an application the state go from starting directly to dead with the following message:
{'id': 0, 'appId': None, 'owner': None, 'proxyUser': None, 'state': 'dead', 'kind': 'spark', 'appInfo': {'driverLogUrl': None, 'sparkUiUrl': None}, 'log': ['19/02/27 09:23:24 INFO Client: Requesting a new application from cluster with 2 NodeManagers', '19/02/27 09:23:25 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (1024 MB per container)', '19/02/27 09:23:25 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead', '19/02/27 09:23:25 INFO Client: Setting up container launch context for our AM', '19/02/27 09:23:25INFO Client: Setting up the launch environment for our AM container', '19/02/27 09:23:25 INFO Client: Preparing resources for our AM container', '19/02/27 09:23:26 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.', '\nYARN Diagnostics: ', 'java.lang.Exception: No YARN application is found with tag livy-session-0-v9wkutit in 120 seconds. Please check your cluster status, it is may be very busy.', 'org.apache.livy.utils.SparkYarnApp.org$apache$livy$utils$SparkYarnApp$$getAppIdFromTag(SparkYarnApp.scala:182) org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:239) org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:236) scala.Option.getOrElse(Option.scala:121) org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:236) org.apache.livy.Utils$$anon$1.run(Utils.scala:94)']}
I have tried to investigate, but really have no clue of what's going on here, is there by any chance someone here which have an idea on how to debug that.