0

We have a small Hadoop cluster where the JobTracker is configured as dynamic (moves from node to node). We'd like to make the data, log files, and interactions more publicly available through a common WebUI (Hadoop MapReduce Job Tracker) rather than through the command line.
The implementation thought is to make all nodes in the hadoop cluster have their web ports open for outbound and create a common DNS alias to all nodes so there's a constant reference to the JobTracker node. Is this a best practice? Also interested in installing a front-end add-on like Apache Hue (http://www.gethue.com) that end-users can access.

I know there's a capability to make the JobTracker static, which would solve this problem but probably introduce others - but I'm sure by making dedicated nodes, it eliminates some of the hadoop intended purposes and power of clustered nodes.

Appreciate any insight on how to strategically best deploy a consistent and accessible URL for admin and end-users.

Kara
  • 6,115
  • 16
  • 50
  • 57

1 Answers1

0

Hue supports JT HA so you could list the possible jobtrackers host/port and Hue will pick the valid one.

However, the best case it probably to update the hue.ini with the new hostname each time you reconfigure it and restart Hue.

Obviously if Cloudera Manager is used for re-configuring the cluster it will update Hue automatically too.

Romain
  • 7,022
  • 3
  • 30
  • 30
  • Thank you for the feedback. Being that the jobtracker is fully dynamic, it could be on any one of many master servers. Do I just pick a few, install it, and then list all the host/port combos? – Adam Westrich Nov 19 '13 at 21:45
  • Yes, I would create a new section inside [[mapred_clusters]] for each of those. Notice that Hue 3 is recommended for JobTracker High Availability. – Romain Nov 20 '13 at 20:56