1

I'm using datastax enterprise graph 5.1.

My backend service (nodejs based) interacts with the dse graph using the datastax nodejs driver.

My Datastax graph is deployed on a cluster consisting of two datacenters, and each datacenter has two nodes. One datacenter is dedicated to gremlin graph queries, the other to gremlin queries with solr support (textual search).

Each node is an EC2 instance, m4.xlarge, with 800GB of ebs disk (ssd).

So far, so good.

Recently I started to perform load test on the backend.

When I monitor the load of my nodes I can see that only one node is hit by the gremlin queries, at 90% cpu, the others are not loaded at all (2-3% cpu).

That is strange beacause according to the documentation, the load balancing is performed by the nodejs driver, so I believe that when I launch a gremlin query, at least the two nodes of the gremlin datacenter should be hit.

It's not the case, and as a consequence, I cannot use the full capacity of my cluster which is a waste of money!

What am I missing?

Thanks in advance!

Toufic Zayed
  • 55
  • 1
  • 9

2 Answers2

0

You should check what's happening under the hood by enabling logging at driver level. Instances of Client are EventEmitter and emit log events:

client.on('log', (level, className, message) => {
  if (level === 'verbose') {
    return;
  }
  console.log(level, className, message);
});
jorgebg
  • 6,560
  • 1
  • 22
  • 31
  • 1
    thanks Jorge, you're always there to help! it appears that the problem may be linked to the heap size, with enough heap (8GB) I can see that the load seems to be evenly distributed. – Toufic Zayed Jun 17 '17 at 13:58
  • ok, sorry to reopen this issue. I've done further testing, and with now 4 nodes, a heap of 14GB, the load is not at all distributed evenly. One of the node shows CPU usage of 50%, on other 30%, and the others two 5% (not loaded). I've activated the log as you said, but nothing special is written. My read and write consistency are set to "one". – Toufic Zayed Jun 22 '17 at 13:46
  • ok, some more details, and possible cause... one important information I didn't provide, is that my front end is a bunch of aws lambda functions. it appears that under load, aws lambda load a lot of instances of lambda functions in parallel, and each lambda function opens connections to cassandra through the nodejs driver. so maybe i'm exhausting the connections, I don't know for sure. But when I add more load to the system, I can see that the nodejs driver cannot get any connection to my cluster... – Toufic Zayed Jun 22 '17 at 14:22
  • Ok, additional infos: when I spread the load, to avoid too much connections open at once, I can see that the load spreads to all nodes of the cluster, but not evenly: one is at 50% cpu, one other at 30% cpu, one other at 15%, and the last at 10%. May be it's normal? But in that case the first node (the coordinator) appears to become a bottleneck when the stress increases... which is a bad thing. – Toufic Zayed Jun 22 '17 at 15:46
0

To answer my own question, it appears that with enough heap size (8GB) the load seems to be evenly distributed.

In my case at least the problem disappeared. Hope it helps others.

Toufic Zayed
  • 55
  • 1
  • 9