2

I'm new to hadoop. So can you please describe what exactly I'm doing here. P.S I received this steps from a friend of mine.

(1) hduser@soham-Inspiron-3521:/usr/local/hadoop/etc/hadoop$ /usr/local/hadoop/bin/hadoop namenode -format
Que 1) why do we need to format namenode each time and not datanode or others
Que 2) why are we using two diiferent path each time

(2) `hduser@soham-Inspiron-3521:/usr/local/hadoop/etc/hadoop$ /usr/local/hadoop/sbin/start-all.sh
Que 1) Do all processes need to be started from "sbin" folder

(3) jps Displays :

hduser@soham-Inspiron-3521:/usr/local/hadoop/etc/hadoop$ jps
7344 ResourceManager
15019 Jps
7187 SecondaryNameNode
6851 NameNode
7659 NodeManager

Que 1) What about taskTracker and jobTracker ?

Even localhost isnot displaying any DataNode (http://localhost:50070/dfshealth.html#tab-startup-progress)

P.S I know these are naive problems but I could not find any solution whatsoever that could solve this problem. Fast reply would be greatly appreciated. Thanks in advance.

user6119874
  • 95
  • 2
  • 12

1 Answers1

2

This is what I could say from the information you have provided:

(1) You don't have to format namenode each time you start the hadoop. It's a one time activity. Once you do it, then whenever you start hadoop next time, you just need to start HDFS (start-dfs.sh) and YARN (start-yarn.sh) services. [P.S. Don't use start-all.sh as it is deprecated]

About the second part of your question, "why are we using two different path each time", which 2 paths you are referring to?

(2) Yes, all processes need to be started from "sbin" folder of your hadoop installation (e.g. /usr/local/hadoop/sbin/).

(3) From jps output, it's clear that you are using hadoop 2.0 in which JobTracker and TaskTracker have corresponding equivalents (but not exactly) as ResourceManager and NodeManager respectively.

Your DataNode is not running. Check the log messages while starting hadoop services to know more about what's going wrong.

PradeepKumbhar
  • 3,361
  • 1
  • 18
  • 31
  • Thanks for clearing those doubts. By two paths, I meant we are first moving into directory " :/usr/local/hadoop/etc/hadoop$ " and then using this command " /usr/local/hadoop/sbin/start-dfs.sh " . Why do we have to move to that particular directory ? – user6119874 May 04 '16 at 06:25
  • You don't have to move to this directory "/usr/local/hadoop/etc/hadoop" for running your command. Not at all required! You can fire this command `/usr/local/hadoop/sbin/start-dfs.sh` from anywhere on your system. – PradeepKumbhar May 04 '16 at 07:06
  • Thanks @daemon12. I'm doing a small project (for myself) and facing a problem developing mapper and reducer function (to be honest it's taking too much of my time to learn it from beginning and it's a small part of project) can you help me with that ? Here is the link [http://stackoverflow.com/q/37004413/6119874] – user6119874 May 04 '16 at 07:27
  • As mentioned by @Serhiy in his answer, you'll get the sample codes online for your usecase. You need to derive your solution from those. If you get stuck, we are happy to help. So ask for help, not the code :) – PradeepKumbhar May 04 '16 at 08:39
  • Also, please upvote the answer if you are satisfied with it :-) – PradeepKumbhar May 04 '16 at 08:41