2

I have Graphite setup on three instances on EC2:

  • carbon-relay - relay1.graphite.prod.example.ec2
  • carbon-cache + webapp - cache3.graphite.prod.example.ec2
  • carbon-cache + webapp - cache4.graphite.prod.example.ec2

The relay is working perfectly with consistent-hashing. The problem is the two web servers are not communicating with each other, so I only see the metrics from one server.

I spent a lot of time looking at https://answers.launchpad.net/graphite/+question/114206 and I can't figure out what I have setup incorrectly. I can run a wget from cache3 against cache4, get data back and see it in the Apache logs. So I don't think it's a firewall issue. I tried enabling suppressError = False in remote_storage.py and turned on DEBUG in local_settings.py, but I don't see any errors in Firebug.

cache3 - local_settings.py

CLUSTER_SERVERS = [ 'cache4.graphite.prod.example.ec2', 'localhost' ]

cache4 - local_settings.py

CLUSTER_SERVERS = [ 'cache3.graphite.prod.example.ec2', 'localhost' ]

I have tried using IP addresses as well and that had no impact.

I did a little more debugging and modified storage.py to directly hard code my remote hosts:

STORE = Store(settings.DATA_DIRS, remote_hosts=["cache4.graphite.prod.example.ec2", "127.0.0.1"])

That worked. So, somehow my CLUSTER_SERVERS value isn't getting pulled in from local_settings.py correctly.

Any suggestions?

organicveggie
  • 1,071
  • 3
  • 15
  • 27

1 Answers1

1

Turns out the permissions on local_settings.py were too restrictive and Apache was unable to read it:

-rw------- 1 root root  4006 May  4 13:40 local_settings.py

Fixing the permissions to 644 (instead of 600) resolved the problem.

organicveggie
  • 1,071
  • 3
  • 15
  • 27