0

Can someone let me know how does samza generates the samza.container.id / SAMZA_CONTAINER_ID when the application is deployed in yarn? I looked around in the samza code base but not able to locate the logic for the generation of the samza.container.id

tuk
  • 5,941
  • 14
  • 79
  • 162

1 Answers1

1

In YARN environment, Samza uses YARN generated containerIds as environmental variables to set each container process's samza.container.id. i.e. when containers are requested by Samza AM process in YARN, YARN RM will reply with a set of allocated container objects, which is of class org.apache.hadoop.yarn.api.records.Container. That's the resource class to uniquely identify a container in YARN and Container#getId().toString() is the container ID string we set to samza.container.id.

The code to get the container Id from YARN RM's response is in YarnClusterResourceManager#onContainersAllocated()

Yi Pan
  • 41
  • 4
  • `Container#getId().toString()` returns a string like `container_e02_1619095810959_0006_10_000004` but `samza.container.id` is a durable sequential integer like 0, 1. How is yarn container id is getting converted to an integer? – tuk Apr 24 '21 at 07:40