0

I am having a problem with the NameNode status ambari shows. The following is happening: - The NameNode keeps going down a few seconds after I start it through ambari (it looks like it never really goes up, but the start process runs successfully);

  • Despite being DOWN according to ambari, if I run JPS in the server the NameNode is hosted it shows that the service is running:

    [hdfs@NNVM ~]$ jps
    39395 NameNode
    4463 Jps
    

and I can access NameNode UI properly;

  • I already restarted both the namenode and ambari-agent the manually but the behavior keeps the same;

  • This problem started after some HBase/Phoenix heavy queries that caused the namenode to go down (not sure if this is actually related but the exact same configurations were working well before this episode);

  • I've been digging for some hours and I am not being able to find error details in the namenode logs nor in the ambari-agent logs that allows me to understand the problem;

I am using HDP 2.4.0, Ambari 2.2.1.1 and no HA options.

Can someone help in this?

Thanks in advance

Edited: to add ambari version.

ssobreiro
  • 11
  • 1
  • 6
  • Are you able to run any HDFS commands? `hdfs dfs -ls /` for example? – tk421 Jun 02 '17 at 08:04
  • Hi, I can. Answer: [nosuser@NNVM ~]$ hdfs dfs -ls / Found 16 items drwxrwxrwx - yarn hadoop 0 2017-05-22 15:47 /app-logs drwxrwx--- - hdfs hdfs 0 2017-03-29 14:38 /apps drwxrwx--- - hdfs hdfs 0 2017-05-03 15:10 /home ... – ssobreiro Jun 02 '17 at 10:06
  • Seems like the NameNode is fine. More likely a problem with Ambari. – tk421 Jun 02 '17 at 17:04
  • yes, it looks like it. I have even stopped the namenode manually and started it from ambari but the namenode keeps going down in ambari only. Any recommendation on how can I troubleshoot ambari services? – ssobreiro Jun 03 '17 at 16:35
  • Did you look at [https://stackoverflow.com/questions/34590134/ambari-shows-service-as-stopped](https://stackoverflow.com/questions/34590134/ambari-shows-service-as-stopped)? – tk421 Jun 04 '17 at 05:48
  • I have already tried killing the namenode, checking through jps that the service is no longer running and then start it again through ambari. This behaviour stands. I have also verified the "hadoop-hdfs-namenode.pid" permissions against other pid files permissions before and after this procedure and it appears to be correct. – ssobreiro Jun 05 '17 at 11:12
  • At this point, you have to look at the ambari logs. You could be seeing [https://issues.apache.org/jira/browse/AMBARI-16448](https://issues.apache.org/jira/browse/AMBARI-16448 - Ambari show namenode is stop but actually namenode is still working). – tk421 Jun 05 '17 at 23:35
  • on the provided jira, transition means that NN were failed to start and exited. For any such reason you need to provide logs of NN. – Reishin Jun 06 '17 at 14:58
  • First of all, here not provided Ambari version, second, no logs as well (from start command inside Ambari, same as no log from NN). The problem could be that on the host working Namenode started not via ambari (but with local configs), and while you trying to start Namenode via ambari, they conflicting and managed one crashing – Reishin Jun 06 '17 at 15:03
  • Hi Reishin, I've just edited above to add ambari version (2.2.1.1). – ssobreiro Jun 06 '17 at 15:22
  • Regarding the logs, could you tell me what to look for inside the namenode or ambari-server|agent logs? I've been looking for any relevant information I was not able to find any error. – ssobreiro Jun 06 '17 at 16:06

0 Answers0