1

When editing hadoop .xml config files (eg. hdfs-site.xml), which node of the hadoop cluster should be the one used to edit the files? Ie. with a cluster of many nodes, all of them having a hadoop folder containing .xml and .properties files, which 'set' of files should be edited to make config changes? Could not tell by looking at the docs.

Eg. I am trying to configure hadoop to use hue following the config changes found here, where need to add lines to hdfs-site.xml, but this file exists on all nodes of the cluster. Do I need to manually edit for every node? Does it depend on if the node is running a certain service (eg. only need to ever change the config files on nodes running the resource manager service)?

Don't use hadoop often, so detailed explanations would be appreciated. Thanks.

Full disclosure (for clarification): I am using a commercial version of hadoop called mapr.

lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102

2 Answers2

1

this file exists on all nodes of the cluster. Do I need to manually edit for every node?

Short answer. Yes, but see bottom of answer.

If you are setting up Hue, you really only need to change the values on the Hue server, though. For the most part, all the other nodes should have already defined the settings that you are configuring Hue with.

Those including settings for

  • HDFS (or MapR-FS)
  • YARN
  • Hive / Impala
  • HBase
  • Oozie

still don't understand how this works below the surface

Hadoop and its components run in distributed fashion. There are clients on each host that read those files. If you don't have a Application Master or ResourceManager on an given machine, it obviously doesn't need a yarn-site.xml or a mapred-site.xml... Similarly a hive-site.xml for Hive & Impala... It's really that simple

using a commercial version of hadoop called mapr

Haven't used MapR, but I would be very surprised if it didn't offer a GUI to sync the configurations. Hortonworks uses Apache Ambari; Cloudera uses Cloudera Manager.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
0

Think have found answer (from user ProfVersaggi in the comments of another post). It appears that the file changes must be copied across all nodes in the cluster for changes to take effect.

This answers my initial question, but still don't understand how this works below the surface and would still appreciate any explanation.

lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102