0

As I am setting up Hadoop, one question keeps popping in my mind but I can't find the answer.

Which Hadoop configuration files need to be copied to which nodes. For example, I'm making changes to the following files:

hadoop-env.sh, core-site.xml, mapred-site.xml, hdfs-site.xml, masters, slaves

Do I need to copy these files to ALL my Hadoop nodes (which is kind of a pain if I update one file). Do only certain files need to be copied? Or, do I only need to make the changes on my master nodes?

Can't seem to find the answer anywhere, so I wanted to ask here. (Up to this point, I have been mirroring all the files across every node, but that seems inefficient. My setup does work.)

JasCav
  • 233
  • 1
  • 12

1 Answers1

0

In terms of what reads which files:

  • hadoop-env.sh: Everything
  • core-site.xml: Everything
  • hdfs-site.xml: HDFS (NameNode, SecondaryNameNode, DataNode)
  • mapred-site.xml: MapReduce (JobTracker, TaskTracker)
  • masters and slaves: I don't think that these are read by the applications directly, but are used by the management scripts instead.

I would however suggest that setup a deployment system so you can easily distribute all these files to all nodes, instead of trying to figure out what needs what. This could just be a script which calls ssh with public key authentication, or it could be something like Puppet or Chef.

mgorven
  • 30,615
  • 7
  • 79
  • 122