0

Let me first introduce the cluster :

  • 3 nodes ( 8 core + 30Gi each)
  • Volume working as ReadWriteMany with a nfs controler
  • Deployment kube PHPFPM : dispatch symfony messages
  • CRD Rabbitmq for handling the queue
  • CRD Postgres (1 replica, 3 cores + 5Gi)
  • CRD Elasticsearch (1 master + 2 data, 2 cores + 2Gi for each)
  • Deployement + HPA (1 min, 5 max) for a supervisor running 5 symfony consumer. At max 5x5 = 25 consumers running.

The context : from PHPFPM we can trigger some scripts who dispatch dozens of thousands of messages. The messages are sent to rabbitmq and then handled by the consumers. Each consumer insert data in posgres and elasticsearch.

**The goal **: increase dynamically (HPA + autoscaling nodes) the amount of consumers in order to handle the queue as fast as possible.

The issue : the rate of consuming is very poor. (60 m/s at max) In local dev (no kube, only docker) with the same amount of consumer we are at 500~ m/s.

So far I've tried :

  • increase the resources allocated to postgres, elasticsearch and consumers.
  • running postgres with 2 replicas
  • increasing the amount consumers does not help

When monitoring i can see no pods reaching the limit (not even close).

I suspect :

  • postgres or elasticsearch to not handle correctly the 25 simultaneous connection
  • the "network" : passing the message between the nodes may impact the perf ?

Help greatly appreciated

0 Answers0