1

When my application runs on a Spark cluster, I know the following

1) the execution plan

2) the DAG with nodes as RDD or operations

3) all jobs/stages/executors/tasks

However, I do not find how to know given a task ID what kinds of work (RDD or operations) the task does.

From a task, I can know its executor ID and which machine it runs. On the machine, if we grep Java and the ID, we can get

/bin/bash -c /usr/lib/jvm/jdk1.8.0_192/bin/java -server -Xmx12288m '-XX:MaxMetaspaceSize=256M' '-Djava.library.path=/opt/hadoop/lib/native' '-Djava.util.logging.config.file=/opt/spark2/conf/parquet.logging.properties' -Djava.io.tmpdir=/tmp/hadoop-root/nmlocaldir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/tmp '-Dspark.driver.port=35617' '-Dspark.network.timeout=3000s' -Dspark.yarn.app.container.log.dir=/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@10.0.72.160:35617 --executor-id 11 --hostname abc --cores 3 --app-id application_1549756402460_92964 --user-class-path file:/tmp/hadoop-root/nm-local-dir/usercache/appcache/application_1549756402460_92964/container_1549756402460_92964_01_000012/__app__.jar 1>/mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stdout 2> /mnt/yarn-logs/userlogs/application_1549756402460_92964/container_1549756402460_92964_01_000012/stderr

But it does not tell me what it does... Does Spark expose the information?

Joe C
  • 2,757
  • 2
  • 26
  • 46
  • Yes, I have the same issue, I just add some logs to the workers code and than I see what is going on. – Ehud Lev Mar 10 '19 at 13:04

0 Answers0