0

For example, I run hive query "SELECT COUNT(1) FROM DB.TABLE_NAME;".

Then hive will translate to MapReduce, and then submit to ResourceManager. The Hadoop's concept is application will be deployed where node that data existing. But, ResourceManager don't know where data is. How to deploy tasks the ResourceManager?

Thank you very much.

user1371662
  • 103
  • 2
  • 7
  • The ResourceManager deploys tasks to NodeManagers, which do know where the data exists by talking to the NameNode – OneCricketeer Dec 26 '18 at 21:45
  • Yes, Thank you. I didn't know that RM communicate with NN when Job deploy. Can I see the about job deploy sequence, doc or blog or article. – user1371662 Dec 27 '18 at 00:31
  • The Hadoop Apache site talks about it. So does the "Hadoop - Definitive Guide" book, within the first few chapters. – OneCricketeer Dec 27 '18 at 16:05

1 Answers1

0

You dont need to worry about the location of your data. Hadoop takes care of data locality while scheduling tasks for Your Job.

Namenode has information about where are the blocks of your file to be processed(in HDFS). Hadoop uses this information to start tasks at those machines and process data. As a developer you are abstracted from this information

Harjeet Kumar
  • 504
  • 2
  • 7