3

When running Spark jobs on top of YARN (yarn-cluster mode), YARN creates the workers in containers that have a name that looks something like this: container_e116_1495951495692_11203_01_000105

What is the naming convention for the containers?

Here is my educated guess:

  • container - Just a constant string, obviously
  • e116 - No Idea what this is. Maybe something to do with the YARN version.
  • 1495951495692_11203 - The application-id
  • 01 - An attempt counter?
  • 000105 - This is probably just an increment integer.

If there is any concrete information about this (or even a refference to the right place in the code), I'd be glad to hear about it.

In light of the above, when running a Spark job on YARN, How can I know which containers belong to which executor?

summerbulb
  • 5,709
  • 8
  • 37
  • 83

2 Answers2

4

You can look at https://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerId.html

A string representation of containerId. The format is container_eepoch_clusterTimestamp_appId_attemptId_containerId when epoch is larger than 0 (e.g. container_e17_1410901177871_0001_01_000005). epoch is increased when RM restarts or fails over. When epoch is 0, epoch is omitted (e.g. container_1410901177871_0001_01_000005).

3

containerId string format is changed if RM restarts with work-preserving recovery enabled. It used to be such format:
Container_{clusterTimestamp}_{appId}_{attemptId}_{containerId}
e.g.: Container_1410901177871_0001_01_000005.

It is now changed to:
Container_e{epoch}_{clusterTimestamp}_{appId}_{attemptId}_{containerId}
e.g.: Container_e17_1410901177871_0001_01_000005.

Here, the additional epoch number is a monotonically increasing integer which starts from 0 and is increased by 1 each time RM restarts. If epoch number is 0, it is omitted and the containerId string format stays the same as before.

Tim Visée
  • 2,988
  • 4
  • 45
  • 55
Akshay
  • 31
  • 2