2

My cluster configuration is as follows:

  1. 3 Node cluster
  2. 128GB RAM per cluster node.
  3. Processor: 16 core HyperThreaded per cluster node. All 3 nodes have Kudu master and T-Server and Impala server, one of the node has Impala catalogue and Impala StateStore.

My issues are as follows:

1) I've a hard time figuring out Dynamic resource pooling in Impala while running concurrent queries. I've tried giving mem_limit still no luck. I've also tried static service pool but with that also I couldn't achieve required concurrency. Even with admission control, the required concurrency was not achieved.

 I) The time taken for 1 query: 500-800ms.

 II) But if 10 concurrent queries are given the time taken grows to 3-6s per query.

 III) But if more than 20 concurrent queries are given the time taken is exceeding 10s per query.

2) One of my cluster nodes is not taking the load after submitting the query, I checked this by the summary of the query. I've tried giving the NUM_NODES as 0 and 1 on the node which is not taking the load, still, the summary shows that the node is not taking the load.

tk421
  • 5,775
  • 6
  • 23
  • 34
Prog_G
  • 1,539
  • 1
  • 8
  • 22
  • 1
    The machine which is not taking the load is accessible from from other 2? – Saif Ahmad Sep 21 '18 at 06:27
  • 1
    @SaifAhmad yes. It is accessible from the other nodes. – Prog_G Sep 21 '18 at 06:35
  • Are you executing the same query in parallel? In that case you may be experiencing "hot-spotting" if the replication factor for your Kudu table is 1, for example. – mazaneicha Sep 24 '18 at 11:29
  • No, replication factor is set to 3 and we are not running the same query in parallel. There are more than 100 different queries which are executed. – Prog_G Sep 24 '18 at 12:02
  • I wouldn't read too much into your performance numbers. Impala is MPP and not designed to run on a 3 node cluster. Most benchmarks run on at least a 10 node cluster with a total of 1TB memory or more. – tk421 Sep 28 '18 at 20:57
  • @tk421 thank you for your reply but can you tell me how can I manage resource pool in impala so that I can execute concurrent queries in Impala? – Prog_G Oct 08 '18 at 05:25

1 Answers1

0

What is the table size ? How many rows are there in the tables ? Are the tables partitioned ? It will be nice if you can compare your configurations with the Impala Benchmarks

As mentioned above Impala is designed to run on a Massive Parallel Processing Infrastructure. If when we had a setup of 10 nodes with 80 cores and 160 virtual cores with 12 TB SAN storage, we could get a computation time of 60 seconds with 5 concurrent users.

Gokul Alex
  • 441
  • 4
  • 21