3

I have a fresh install of Hortonworks Data Platform 2.2 installed on a small cluster (4 machines) but when I login to the Ambari GUI, the majority of dashboard stats boxes (HDFS disk usage, Network usage, Memory usage etc) are not populated with any statistics, instead they show the message:

No data There was no data available.  Possible reasons include inaccessible Ganglia service

Clicking on the HDFS service link gives the following summary:

NameNode    Started
SNameNode   Started
DataNodes   4/4 DataNodes Live
NameNode Uptime     Not Running
NameNode Heap   n/a / n/a (0.0% used)
DataNodes Status    4 live / 0 dead / 0 decommissioning
Disk Usage (DFS Used)   n/a / n/a (0%)
Disk Usage (Non DFS Used)   n/a / n/a (0%)
Disk Usage (Remaining)  n/a / n/a (0%)
Blocks (total)  n/a
Block Errors    n/a corrupt / n/a missing / n/a under replicated
Total Files + Directories   n/a
Upgrade Status  Upgrade not finalized
Safe Mode Status    n/a

The Alerts and Health Checks box to the right of the screen is not displaying any information but if I click on the settings icon this opens the Nagios frontend and again, everything looks healthy here!

The install went smoothly (CentOS 6.5) and everything looks good as far as all services are concerned (all started with green tick next to service name). There are some stats displayed on the dashboard: 4/4 datanodes are live, 1/1 Nodemanages live & 1/1 Supervisors are live. I can write files to HDFS so its looks like it's a Ganglia issue?

The Ganglia daemon seems to be working ok:

ps -ef | grep gmond
nobody    1720     1  0 12:54 ?        00:00:44 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHistoryServer/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPHistoryServer/gmond.pid
nobody    1753     1  0 12:54 ?        00:00:44 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPFlumeServer/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPFlumeServer/gmond.pid
nobody    1790     1  0 12:54 ?        00:00:48 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseMaster/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPHBaseMaster/gmond.pid
nobody    1821     1  1 12:54 ?        00:00:57 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPKafka/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPKafka/gmond.pid
nobody    1850     1  0 12:54 ?        00:00:44 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPSupervisor/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPSupervisor/gmond.pid
nobody    1879     1  0 12:54 ?        00:00:45 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPSlaves/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPSlaves/gmond.pid
nobody    1909     1  0 12:54 ?        00:00:48 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPResourceManager/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPResourceManager/gmond.pid
nobody    1938     1  0 12:54 ?        00:00:50 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPNameNode/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPNameNode/gmond.pid
nobody    1967     1  0 12:54 ?        00:00:47 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPNodeManager/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPNodeManager/gmond.pid
nobody    1996     1  0 12:54 ?        00:00:44 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPNimbus/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPNimbus/gmond.pid
nobody    2028     1  1 12:54 ?        00:00:58 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPDataNode/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPDataNode/gmond.pid
nobody    2057     1  0 12:54 ?        00:00:51 /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseRegionServer/gmond.core.conf --pid-file=/var/run/ganglia/hdp/HDPHBaseRegionServer/gmond.pid

I have checked the Ganglia service on each node, the processes are running as expected

ps -ef | grep gmetad
nobody    2807     1  2 12:55 ?        00:01:59 /usr/sbin/gmetad --conf=/etc/ganglia/hdp/gmetad.conf --pid-file=/var/run/ganglia/hdp/gmetad.pid

I have tried restarting Ganglia services with no luck, restarted all services but still the same. Does anyone have any ideas how I get the dashboard to work properly? Thank you.

ScottFree
  • 582
  • 6
  • 23
  • 1
    It looks like it is not hearing from the name node either. You have probably covered this, but, are all your nodes in /etc/hosts for one another, iptables off, selinux off, and all registered with DNS? We had some weirdness with Hue when there were DNS issues between nodes. Also, have you looked into logging for Ambari? There might be some RPC connections failing in there somewhere, might illuminate things... – suiterdev Jan 02 '15 at 16:20
  • Hi thanks for the reply, I've checked and double checked all you suggest and all seems in order, hosts files ok, iptables off and selinux off but still no stats. I've trawled through the logs and there is nothing obvious. It looks like HDFS is working as expected, all that's not working is the stats on the Ambari dashboard grrrr, I'll keep digging, thanks again. – ScottFree Jan 05 '15 at 09:45
  • I figured it out, it was proxy related (and I never mentioned in my original post I was behind a proxy so a little unfair!). I'll document below, I noticed this in the ambari logs: java.io.FileNotFoundException: http://node1.dms/ganglia/graph.php?g=mem_report&json=1 thanks again – ScottFree Jan 05 '15 at 10:35
  • Good stuff! Yes, the networking for a cluster is crucial. Our cluster was put in our DMZ (not an idea anyone on technical staff endorsed, BTW) and that led to all sorts of problems. We had to hack around Sqoop talking to MSSQL the way it wants to, and Oozie never worked in Hue, because somewhere in the stack it was unable to write workflow metadata to the correct place...in short, don't do that. :-) – suiterdev Jan 05 '15 at 15:56

1 Answers1

4

It turns out to be a proxy issue, to access the internet I had to add my proxy details to the file /var/lib/ambari-server/ambari-env.sh

export AMBARI_JVM_ARGS=$AMBARI_JVM_ARGS' -Xms512m -Xmx2048m -Dhttp.proxyHost=theproxy -Dhttp.proxyPort=80 -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false'

When ganglia was trying to access each node in the cluster the request was going via the proxy and never resolving, to overcome the issue I added my nodes to the exclude list (add the flag -Dhttp.nonProxyHosts) like so:

export AMBARI_JVM_ARGS=$AMBARI_JVM_ARGS' -Xms512m -Xmx2048m -Dhttp.proxyHost=theproxy -Dhttp.proxyPort=80 -Dhttp.nonProxyHosts="localhost|node1.dms|node2.dms|node3.dms|etc" -Djava.security.auth.login.config=/etc/ambari-server/conf/krb5JAASLogin.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false'

After adding the exclude list the stats were retrieved as expected!

ScottFree
  • 582
  • 6
  • 23