0

My question is more related to the actual code. I am working on Hadoop 2.7.7.

I seem to understand that after an application has been submitted by the client, the Resource Manager has to assign one application master to process the application.

At this point, when the Resource Manager communicates with the Name Node to get the meta data of the files required for processing, which package and class is this communication located at?

Edit: Currently I am looking at FifoScheduler.java in the package org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo. The method private void assignContainers(FiCaSchedulerNode node) choses the application from a fifo list of applications that have a request to launch containers for a node. So logically I believe that by the time the fifoscheduler is looking to chose the application from the fifo list, the RM already has the information about the files/file locations required for ay application on the list. I looked at the all the methods in the call hierarchy for the aforementioned assignContainers() method in FifoScheduler.java and could find no clues as to where/when the RM receives information about the files (or metadata of files) required for an application.

This is the best I could describe where I am looking in the code. If you need any additional information I would gladly provide that. I apologize if this is unclear.

aquaman
  • 1
  • 1
  • Asking for off-site resource is considered off-topic for StackOverflow [help]. Besides, the Hadoop source code around this is quite complex, and enabling remote Java debugging on the ResourceManager process might get you there faster. – OneCricketeer Dec 07 '18 at 08:24
  • Thank you for the response. I now understand that what I asked is considered off-topic, my bad. I am a PhD student working on Hadoop, so I needed this information for my research. I spent many days straight trying to narrow down this point in the code and thought this question could be one of my last resorts. I have tried debugging the processes linked to resource manager to no avail- I could be doing something wrong though. – aquaman Dec 07 '18 at 11:18
  • You're welcome to [edit] your question to include the steps you've tried so far and what classes you've looked at. – OneCricketeer Dec 07 '18 at 13:29
  • 1
    Thank you, I have edited the question and added more information. Additionally, I would like to point out that I have been debugging the ResourceManager process itself (the main file is located at org.apache.hadoop.yarn.server.resourcemanager). This lead me to the schedulers and Fifo seemed to be more easier to debug so i went down that path, looking up in the call hierarchy of its methods. – aquaman Dec 08 '18 at 17:24
  • 1
    I don't think the RM ever contacts the NN. It makes an AM, then pushes out jobs to NM via calculating InputSplits, which read from the NN and DNs – OneCricketeer Dec 08 '18 at 22:27
  • I see, that makes sense. I'll look into the NN and DN processes a little deeper. Thank you! – aquaman Dec 11 '18 at 17:57

0 Answers0