Hazelcast avg get latency is > 2 ms for 18k/s throughput

Question

Background: We are working on evaluating hazelcast which can act as an alternative of Redis.

Setup :

3 members in a cluster under a single subnet (production boxes). Each member has ~1.4GB of data.
Near caching is off.
Each member has 1 backup.
Code deployed by preparing a spring boot jar and cache is implemented as embedded one.
VM config : 8C, 31GB RAM
code uses IMAP to retrieve and put the keys in the cache.

LoadTest : Attempted 18K/s rest API calls to read the data. But hazelcast is showing avg get latency of around 3-4ms which I feel should be in microsecond as we have been already seeing that much of get command latency with redis setup. CPU Load was ~95% during this test.

A member which gave this latency has heap usage of ~60% (committed: 7.85GB, used: 4.68GB). It is though with all the members in the cluster.

Need help to understand that is my configuration somewhere wrong, because of which I am NOT able to achieve get latency in microseconds?

Config for starting embedded cache:

        config.addMapConfig(mapConfig());

        NetworkConfig networkConfig = config.getNetworkConfig();
        JoinConfig join = networkConfig.getJoin();
        join.getMulticastConfig().setEnabled(false);
        join.getTcpIpConfig().setEnabled(true).setMembers(
                Arrays.asList(
                        "ip1:5701",
                        "ip2:5701",
                        "ip3:5701"
                )
        );
        return config;```

Redis does not work in embedded mode, it only works in client-server. So there is already a clear difference between Hazelcast setup and Redis, this would automatically lead to different results than expected. Hence, some clarity here on deployment would be helpful. — wildnez, Jan 11 '21 at 12:08
95% CPU usage sounds like the first thing to investigate. It makes it likely that there was some buildup of requests in a queue, increasing latency. What exactly do you include in the measured latency, what starts the clock and what stops it? — Marko Topolnik, Jan 11 '21 at 12:08
Also, you have mentioned using Hazelcast in embedded mode but are still making REST calls, that basically means a client-server setup. How did you measure latency with Redis setup? Microseconds response time using a REST service does not sound realistic as REST itself means web traffic. May be I'll be in a better position to help once we have some clarity on the question. — wildnez, Jan 11 '21 at 12:11
@wildnez I used jmeter to measure the performance, latency I mentioned is from the hazelcast management center. (not the rest api latency and hence network latency is excluded) JMeter config : numThreads: 200 (might be cause of full cpu utilization) ramp time: 2 mins expected throughput: 30k/s total execution time : 15 mins — plug, Jan 11 '21 at 14:06
@MarkoTopolnik as I am using jmeter with config : desired throughput : 30k/s numThreads : 200 ramp up time : 2 mins tot execution time : 15 mins — plug, Jan 11 '21 at 14:10
So your clock starts when you send the REST call, and stops when you receive a REST response? — Marko Topolnik, Jan 11 '21 at 14:22
So far it sounds to me like you have: 1. a network hop to send the REST request; 2. a hop for the IMDG to access the correct partition; 3. a hop for the data to return to the REST-serving node; 4. a hop to send the REST response. Do you know the cost of these network hops? — Marko Topolnik, Jan 11 '21 at 18:01
@MarkoTopolnik I am not sure how to identify the cost of the network hops. Please let me know if you know about any tool to gauge the same. I will try to publish it accordingly. I believe hazelcast is showing latency combining of network hops b/w partition and REST-serving node. As I am looking at latency only for the get operations performed via REST API we can exclude other hops like 1 & 4 as mentioned by you. — plug, Jan 11 '21 at 18:51
@plug can you also answer on how do you measure latency with Redis? jMeter will measure end-to-end latency i.e. the latency of one REST call leaving and coming back in. And you have also mentioned Management Center. So its really confusing. It is still not clear how you are benchmarking Hazelcast (end-to-end or only Hazelcast latency), how you are measuring Redis performance, what is the deployment strategy of Redis and Hazelcast etc. — wildnez, Jan 12 '21 at 01:23
@wildnez I already mentioned that I am considering get latency published by hazelcast management center and as per my understanding that should provide the actual latency of hazelcast operation. I am not even looking at the result published by jmeter. Jmeter I used only for the load test. — plug, Jan 12 '21 at 06:10
I'm afraid that properly answering benchmarking results like this one is not possible based on just a few text messages on Stack Overflow. I suggest you contact Hazelcast on our community Slack channel: hazelcastcommunity.slack.com. Finding the precise cause of some latency measurement takes a lot of detailed work. — Marko Topolnik, Jan 12 '21 at 09:28

Hazelcast avg get latency is > 2 ms for 18k/s throughput

0 Answers0