0

I am having a lot of trouble understanding how to scale Graphite. I have a production instance of graphite (carbon-cache + whisper + graphite-web + grafana). I am running out of disk space and I think I need to add in a 2nd node. I can't seem to find any good guides on people doing this and I am having a hard time understanding the documentation.

Can I just spin up carbon-cache + whisper on the 2nd node and configure carbon-relay to relay the information to both the 1st and 2nd node?

Will graphite-web be able to query both successfully?

I feel like I am missing something very important.

-- PS. I tried googling this but my google fu may be bad. I also searched stackoverflow and serverfault but all I can seem to find are posts about piping multiple servers metrics to graphite/statsd.

EDIT

I think I need to clarify. I can setup the relay and cache just fine (they seem fine). It is graphite-web that I have trouble with. I setup a new graphite-web on a standalone VM (nothing but graphite-web, uwsgi, and nginx installed on it). From here I tried querying it with the find below and it is always empty.

curl 'localhost:8543/metrics/find?query=*' 
[]

That said on the original server it works just fine outside of being full.

root@original_server:/etc/nginx/sites-enabled# curl -s 
'localhost:8080/metrics/find?query=*' | jq
[
  {
    "text": "bobstats",
    "expandable": 1,
    "leaf": 0,
    "id": "bobstats",
    "allowChildren": 1
  },
 ...
]

Does the carbon-relay need to be with the graphite-web? Do I need to install graphite-web on each of the cache servers as well or each of the relays?

Lookcrabs
  • 21
  • 6

1 Answers1

1

You need to enable consistent hashing and use carbon-relay. So it would look something like this

[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2004
RELAY_METHOD = consistent-hashing 
DESTINATIONS = 10.0.1.10:2004, 10.0.1.11:2004

Your DESTINATIONS are your backend carbon caches. The relay will hash the metric and always send it to the same destination. Also you'll have to point the web to the multiple destinations.

Mike
  • 22,310
  • 7
  • 56
  • 79
  • It's worth noting that the consistent-hashing algorithm means that when you add a new node to the cluster it does *not* cause a complete shuffle of the metrics among nodes. Eg: if you have a 2-node cluster and you add a 3rd node you will more or less see 1/3 of the metrics on each existing node moved to the new node. * [General explanation of consistent hashing](https://akshatm.svbtle.com/consistent-hash-rings-theory-and-implementation), * [A good workflow for rebalancing a cluster and backfilling data](https://phabricator.wikimedia.org/T86316) – Sammitch Aug 27 '18 at 23:10