0

I have a simple two node cluster setup that has been working fine for the last couple of weeks. I haven't made any changes on my nodes, but a few days ago data metrics stopped showing up. By all indications everything else is working fine and OpsCenter is able to see if my nodes are running or not without any problems. Also no errors are reported in the GUI.

I've seen a couple other posts although the solutions are not related to my scenario. I do not have a heavy load on the server. I have less than 10 column families as it's just for testing and there is no thift password configured.

When I look in the opscenterd.log I see the following:

2015-06-09 00:16:40+0000 [] ERROR: Error fetching metric data:  Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/opscenterd/MetricFetcher.py", line 470, in _fetch_through_cache
    UnavailableException: UnavailableException()

2015-06-09 00:16:40+0000 [] ERROR: Problem while calling NewMetricsController (IndexError): list index out of range
      File "/usr/share/opscenter/lib/py-debian/2.7/amd64/twisted/internet/defer.py", line 1020, in _inlineCallbacks
        result = g.send(result)

      File "/usr/lib/python2.7/dist-packages/opscenterd/MetricFetcher.py", line 612, in fetchMetrics

And in agent.log I see this:

ERROR [os-metrics-5] 2015-06-09 17:47:41,161 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory
 ERROR [os-metrics-4] 2015-06-09 17:47:41,162 Long os-stats collector failed: Cannot run program "iostat": error=2, No such file or directory

Any ideas on how to resolve this?

  • 1
    Often I'm able to restart the opscenterd process and or the agents and I get data back. – phact Jun 09 '15 at 01:59
  • I tried that. I even tried rebooting the machine but no luck. – KingOfHypocrites Jun 09 '15 at 02:11
  • Is there network connectivity between the agents and daemon? What do the agent logs say? – phact Jun 09 '15 at 02:11
  • I updated my question with what I could find in the agent.log. – KingOfHypocrites Jun 09 '15 at 17:52
  • I noticed there was no iostat on the machine that has opscenter. not sure if something removed it for some reason (the stats used to work). I setup both machines exactly the same. The only difference is one has opscenter and the other doesn't. Also this worked fine for a few weeks after setting up the machines. I updated to DSE 4.7 recently but I think it stopped working after that. DSE is the only thing I run on the machines. I tried putting iostat on the opscenter node but it didn't help. – KingOfHypocrites Jun 09 '15 at 22:02
  • apt-get install iostat? – phact Jun 09 '15 at 22:19
  • I tried that already per my previous comment... I used: apt-get install sysstat – KingOfHypocrites Jun 09 '15 at 22:20
  • To clarify... I can now run iostat from the command line but it didn't any effect on opscenter – KingOfHypocrites Jun 09 '15 at 22:24
  • iostat error will prevent agent from storing some disk metrics. the UnavailableException from OpsCenter daemon is reporting that the replication factor you configured the OpsCenter keyspace isn't available when trying to read the metrics from it. Can you describe your opscenter keyspace and output from "nodetool status" ? – Chris Lohfink Jun 29 '15 at 14:33
  • I had to take the nodes down and start from scratch but will follow up again if I see the error. If DSE does use iostat, it doesn't install it which seems odd. Nor have I seen any docs that this should be installed manually as part of the installation. I will say that I do a replication factor of 2. I only have two nodes total ... It's a pretty simple setup. – KingOfHypocrites Jun 29 '15 at 14:37

0 Answers0