What is the naming convention for YARN containers used by Spark?

Question

When running Spark jobs on top of YARN (yarn-cluster mode), YARN creates the workers in containers that have a name that looks something like this: container_e116_1495951495692_11203_01_000105

What is the naming convention for the containers?

Here is my educated guess:

container - Just a constant string, obviously
e116 - No Idea what this is. Maybe something to do with the YARN version.
1495951495692_11203 - The application-id
01 - An attempt counter?
000105 - This is probably just an increment integer.

If there is any concrete information about this (or even a refference to the right place in the code), I'd be glad to hear about it.

In light of the above, when running a Spark job on YARN, How can I know which containers belong to which executor?

score 4 · Accepted Answer · answered Jun 04 '17 at 11:09

You can look at https://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerId.html

A string representation of containerId. The format is container_eepoch_clusterTimestamp_appId_attemptId_containerId when epoch is larger than 0 (e.g. container_e17_1410901177871_0001_01_000005). epoch is increased when RM restarts or fails over. When epoch is 0, epoch is omitted (e.g. container_1410901177871_0001_01_000005).

score 3 · Answer 2 · edited Aug 21 '17 at 16:24

containerId string format is changed if RM restarts with work-preserving recovery enabled. It used to be such format:
Container_{clusterTimestamp}_{appId}_{attemptId}_{containerId}
e.g.: Container_1410901177871_0001_01_000005.

It is now changed to:
Container_e{epoch}_{clusterTimestamp}_{appId}_{attemptId}_{containerId}
e.g.: Container_e17_1410901177871_0001_01_000005.

Here, the additional epoch number is a monotonically increasing integer which starts from 0 and is increased by 1 each time RM restarts. If epoch number is 0, it is omitted and the containerId string format stays the same as before.

What is the naming convention for YARN containers used by Spark?

2 Answers2