1

While starting spark-shell --master yarn --deploy-mode client I am getting error :

Yarn application has already ended! It might have been killed or unable to launch application master.

Here is the complete log from Yarn:

19/08/28 00:54:55 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

Container: container_1566921956926_0010_01_000001 on rhel7-cloudera-dev_33917 =============================================================================== LogType:stderr Log Upload Time:28-Aug-2019 00:46:31 LogLength:523 Log Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/yarn/local/usercache/rhel/filecache/26/__spark_libs__5634501618166443611.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/etc/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

LogType:stdout Log Upload Time:28-Aug-2019 00:46:31 LogLength:5597 Log Contents: 2019-08-28 00:46:19 INFO SignalUtils:54 - Registered signal handler for TERM 2019-08-28 00:46:19 INFO SignalUtils:54 - Registered signal handler for HUP 2019-08-28 00:46:19 INFO SignalUtils:54 - Registered signal handler for INT 2019-08-28 00:46:19 INFO SecurityManager:54 - Changing view acls to: yarn,rhel 2019-08-28 00:46:19 INFO SecurityManager:54 - Changing modify acls to: yarn,rhel 2019-08-28 00:46:19 INFO SecurityManager:54 - Changing view acls groups to: 2019-08-28 00:46:19 INFO SecurityManager:54 - Changing modify acls groups to: 2019-08-28 00:46:19 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, rhel); groups with view permissions: Set(); users with modify permissions: Set(yarn, rhel); groups with modify permissions: Set() 2019-08-28 00:46:20 INFO ApplicationMaster:54 - Preparing Local resources 2019-08-28 00:46:21 INFO ApplicationMaster:54 - ApplicationAttemptId: appattempt_1566921956926_0010_000001 2019-08-28 00:46:21 INFO ApplicationMaster:54 - Waiting for Spark driver to be reachable. 2019-08-28 00:46:21 INFO ApplicationMaster:54 - Driver now available: rhel7-cloudera-dev:34872 2019-08-28 00:46:21 INFO TransportClientFactory:267 - Successfully created connection to rhel7-cloudera-dev/192.168.56.112:34872 after 107 ms (0 ms spent in bootstraps) 2019-08-28 00:46:22 INFO ApplicationMaster:54 - =============================================================================== YARN executor launch context: env: CLASSPATH -> {{PWD}}{{PWD}}/spark_conf{{PWD}}/spark_libs/$HADOOP_CONF_DIR$HADOOP_COMMON_HOME/share/hadoop/common/$HADOOP_COMMON_HOME/share/hadoop/common/lib/$HADOOP_HDFS_HOME/share/hadoop/hdfs/$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/$HADOOP_YARN_HOME/share/hadoop/yarn/$HADOOP_YARN_HOME/share/hadoop/yarn/lib/* $HADOOP_COMMON_HOME/$HADOOP_COMMON_HOME/lib/$HADOOP_HDFS_HOME/$HADOOP_HDFS_HOME/lib/$HADOOP_MAPRED_HOME/$HADOOP_MAPRED_HOME/lib/$HADOOP_YARN_HOME/$HADOOP_YARN_HOME/lib/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib//etc/hadoop-2.6.0/etc/hadoop:/etc/hadoop-2.6.0/share/hadoop/common/lib/:/etc/hadoop-2.6.0/share/hadoop/common/:/etc/hadoop-2.6.0/share/hadoop/hdfs:/etc/hadoop-2.6.0/share/hadoop/hdfs/lib/:/etc/hadoop-2.6.0/share/hadoop/hdfs/:/etc/hadoop-2.6.0/share/hadoop/yarn/lib/:/etc/hadoop-2.6.0/share/hadoop/yarn/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/lib/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/:/etc/hadoop-2.6.0/contrib/capacity-scheduler/.jar{{PWD}}/spark_conf/hadoop_conf SPARK_DIST_CLASSPATH -> /etc/hadoop-2.6.0/etc/hadoop:/etc/hadoop-2.6.0/share/hadoop/common/lib/:/etc/hadoop-2.6.0/share/hadoop/common/:/etc/hadoop-2.6.0/share/hadoop/hdfs:/etc/hadoop-2.6.0/share/hadoop/hdfs/lib/:/etc/hadoop-2.6.0/share/hadoop/hdfs/:/etc/hadoop-2.6.0/share/hadoop/yarn/lib/:/etc/hadoop-2.6.0/share/hadoop/yarn/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/lib/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/:/etc/hadoop-2.6.0/contrib/capacity-scheduler/.jar SPARK_YARN_STAGING_DIR -> *********(redacted) SPARK_USER -> *********(redacted) SPARK_CONF_DIR -> /etc/spark/conf SPARK_HOME -> /etc/spark

command: {{JAVA_HOME}}/bin/java \ -server \ -Xmx1024m \ -Djava.io.tmpdir={{PWD}}/tmp \ '-Dspark.driver.port=34872' \ -Dspark.yarn.app.container.log.dir= \ -XX:OnOutOfMemoryError='kill %p' \ org.apache.spark.executor.CoarseGrainedExecutorBackend \ --driver-url \ spark://CoarseGrainedScheduler@rhel7-cloudera-dev:34872 \ --executor-id \ \ --hostname \ \ --cores \ 1 \ --app-id \ application_1566921956926_0010 \ --user-class-path \ file:$PWD/app.jar \ 1>/stdout \ 2>/stderr

resources: spark_libs -> resource { scheme: "hdfs" host: "rhel7-cloudera-dev" port: 9000 file: "/user/rhel/.sparkStaging/application_1566921956926_0010/spark_libs__5634501618166443611.zip" } size: 232107209 timestamp: 1566933362350 type: ARCHIVE visibility: PRIVATE __spark_conf -> resource { scheme: "hdfs" host: "rhel7-cloudera-dev" port: 9000 file: "/user/rhel/.sparkStaging/application_1566921956926_0010/spark_conf.zip" } size: 208377 timestamp: 1566933365411 type: ARCHIVE visibility: PRIVATE

=============================================================================== 2019-08-28 00:46:22 INFO RMProxy:98 - Connecting to ResourceManager at /0.0.0.0:8030 2019-08-28 00:46:22 INFO YarnRMClient:54 - Registering the ApplicationMaster 2019-08-28 00:46:22 INFO YarnAllocator:54 - Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead) 2019-08-28 00:46:22 INFO YarnAllocator:54 - Submitted 2 unlocalized container requests. 2019-08-28 00:46:22 INFO ApplicationMaster:54 - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals 2019-08-28 00:46:22 ERROR ApplicationMaster:43 - RECEIVED SIGNAL TERM 2019-08-28 00:46:23 INFO ApplicationMaster:54 - Final app status: UNDEFINED, exitCode: 16, (reason: Shutdown hook called before final status was reported.) 2019-08-28 00:46:23 INFO ShutdownHookManager:54 - Shutdown hook called

Container: container_1566921956926_0010_02_000001 on rhel7-cloudera-dev_33917 =============================================================================== LogType:stderr Log Upload Time:28-Aug-2019 00:46:31 LogLength:3576 Log Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/yarn/local/usercache/rhel/filecache/26/__spark_libs__5634501618166443611.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/etc/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main" java.io.IOException: Failed on local exception: java.io.IOException; Host Details : local host is: "rhel7-cloudera-dev/192.168.56.112"; destination host is: "rhel7-cloudera-dev":9000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1474) at org.apache.hadoop.ipc.Client.call(Client.java:1401) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1977) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:235) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:232) at scala.Option.foreach(Option.scala:257) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:232) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:197) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:800) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:799) at org.apache.spark.deploy.yarn.ApplicationMaster.(ApplicationMaster.scala:197) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:823) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:854) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) Caused by: java.io.IOException at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:935) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967) Caused by: java.lang.InterruptedException ... 2 more

LogType:stdout Log Upload Time:28-Aug-2019 00:46:31 LogLength:975 Log Contents: 2019-08-28 00:46:26 INFO SignalUtils:54 - Registered signal handler for TERM 2019-08-28 00:46:26 INFO SignalUtils:54 - Registered signal handler for HUP 2019-08-28 00:46:26 INFO SignalUtils:54 - Registered signal handler for INT 2019-08-28 00:46:27 INFO SecurityManager:54 - Changing view acls to: yarn,rhel 2019-08-28 00:46:27 INFO SecurityManager:54 - Changing modify acls to: yarn,rhel 2019-08-28 00:46:27 INFO SecurityManager:54 - Changing view acls groups to: 2019-08-28 00:46:27 INFO SecurityManager:54 - Changing modify acls groups to: 2019-08-28 00:46:27 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, rhel); groups with view permissions: Set(); users with modify permissions: Set(yarn, rhel); groups with modify permissions: Set() 2019-08-28 00:46:28 INFO ApplicationMaster:54 - Preparing Local resources 2019-08-28 00:46:28 ERROR ApplicationMaster:43 - RECEIVED SIGNAL TERM

Any suggestion to resolve this issue?

Sabyasachi Mitra
  • 365
  • 1
  • 4
  • 12
  • Not sure but, can you try manually cleaning `/user/rhel/.sparkStaging/` directory. Sometime if old application is not closed properly `System.exit(0)` then we see such error. – SMaZ Aug 27 '19 at 20:45

0 Answers0