0

Hello, Stackoverflow Community.

I have a Spring Boot application that uses Jcache with Hazelcast implementation as a cache Framework.

Each Hazelcast node has 5 caches with the size of 50000 elements each. There are 4 Hazelcast Instances that form a cluster.

The problem that I face is the following:

I have a very heavy call that reads data from all four caches. On the initial start, when all caches are yet empty, this call takes up to 600 seconds.

When there is one Hazelcast instance running and all 5 caches are filled with data, then this call happens relatively fast, it takes on average only 4 seconds.

When I start 2 Hazelcast instances and they form a cluster, then the response time gets worse, and the same call takes already 25 seconds on average.

And the more Hazelcast instances I add in a cluster, the longer the response time gets. Of course, I was expecting to see some worse delivery time when data is partitioned among Hazelcast nodes in a cluster. But I did not expect that just by adding one more hazelcast instance, the response time would get 6 - 7 times slower...

Please note, that for simplicity reasons and for testing purposes, I just start four Spring Boot Instances with each Hazelcast embedded node embedded in it on one machine. Therefore, such poor performance cannot be justified by network delays. I assume that this API call is so slow even with Hazelcast because much data needs to be serialized/deserialized when sent among Hazelcast cluster nodes. Please correct me if I am wrong.

The cache data is partitioned evenly among all nodes. I was thinking about adding near cache in order to reduce latency, however, according to the Hazelcast Documentation, the near cache is not available for Jcache Members. In my case, because of some project requirements, I am not able to switch to Jcache Clients to make use of Near Cache. Is there maybe some advice on how to reduce latency in such a scenario?

Thank you in advance.


DUMMY CODE SAMPLES TO DEMONSTRATE THE PROBLEM:

  1. Hazelcast Config: stays default, nothing is changed
  2. Caches:
private void createCaches() {

      CacheConfiguration<?, ?> cacheConfig = new CacheConfig<>()
              .setEvictionConfig(
                      new EvictionConfig()
                              .setEvictionPolicy(EvictionPolicy.LRU)
                              .setSize(150000)
                              .setMaxSizePolicy(MaxSizePolicy.ENTRY_COUNT)
              )
              .setBackupCount(5)
              .setInMemoryFormat(InMemoryFormat.OBJECT)
              .setManagementEnabled(true)
              .setStatisticsEnabled(true);
      cacheManager.createCache("books", cacheConfig);
      cacheManager.createCache("bottles", cacheConfig);
      cacheManager.createCache("chairs", cacheConfig);
      cacheManager.createCache("tables", cacheConfig);
      cacheManager.createCache("windows", cacheConfig);
  }

  1. Dummy Controller:
@GetMapping("/dummy_call")
    public String getExampleObjects() { // simulates a situatation where one call needs to fetch data from multiple cached sources.
        Instant start = Instant.now();
        int i = 0;
        while (i != 50000) {
            exampleService.getBook(i);
            exampleService.getBottle(i);
            exampleService.getChair(i);
            exampleService.getTable(i);
            exampleService.getWindow(i);
            i++;
        }
        Instant end = Instant.now();
        return String.format("The heavy call took: %o seconds", Duration.between(start, end).getSeconds());
    }

  1. Dummy service:
@Service
public class ExampleService {

    @CacheResult(cacheName = "books")
    public ExampleBooks getBook(int i) {
        try {
            Thread.sleep(1); // just to simulate slow service here!
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Book(Integer.toString(i), Integer.toString(i));
    }

    @CacheResult(cacheName = "bottles")
    public ExampleMooks getBottle(int i) {
        try {
            Thread.sleep(1);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Bottle(Integer.toString(i), Integer.toString(i));
    }

    @CacheResult(cacheName = "chairs")
    public ExamplePooks getChair(int i) {
        try {
            Thread.sleep(1);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Chair(Integer.toString(i), Integer.toString(i));
    }

    @CacheResult(cacheName = "tables")
    public ExampleRooks getTable(int i) {
        try {
            Thread.sleep(1);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Table(Integer.toString(i), Integer.toString(i));
    }

    @CacheResult(cacheName = "windows")
    public ExampleTooks getWindow(int i) {
        try {
            Thread.sleep(1);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return new Window(Integer.toString(i), Integer.toString(i));
    }
}
  • How many cache lookups do you make in that _heavy call_? What API are you using exactly? Are you doing concurrent calls or are you measuring single call only? – František Hartman Feb 22 '22 at 22:16
  • 1) No, in this scenario there are no concurrent calls. Just one simple Get call to the backend that runs that heavy call. 2) This call makes 5 * 50000 = 250000 lookups. However, this call delivers a deterministic result. Meaning that on the first API all missed entries are placed in caches. And on the second time, all 250000 lookups are cache hits. 3) I updated the main text with a test source code example that would simulate the problem. – Anton Skripin Feb 22 '22 at 22:25
  • @FrantišekHartman This code is just an example. However, in the real project, the situation with embedded Jcache Hazelcast is kind of similar. There are some API calls that need to fetch data from several caches and there might be several thousands of lookups from each cache within one call.... The situation is exactly the same: when there is no cluster and only one Hazelcast instance is running, then get happens fast. However, as soon as the cluster of 2 or more nodes is formed, then we experience significant delays. – Anton Skripin Feb 22 '22 at 22:43

1 Answers1

2

If you do the math:

4s / 250 000 lookups is 0.016 ms per local lookup. This seems rather high, but let's take that.

When you add a single node then the data gets partitioned and half of the requests will be served from the other node. If you add 2 more nodes (4 total) then 25 % of the requests will be served locally and 75 % will be served over network. This should explain why the response time grows when you add more nodes.

Even simple ping on localhost takes twice or more time. On a real network the read latency we see in benchmarks is 0.3-0.4 ms per read call. This makes:

0.25 * 250k *0.016 + 0.75 * 250k * 0.3 = ~57 s

You simply won't be able to make so many calls serially over the network (even local one), you need to either

  • parallelize the calls - use javax.cache.Cache#getAll to reduce the number of calls
  • you can try enabling reading local backups via com.hazelcast.config.MapConfig#setReadBackupData so there is less requests over the network.

The read backup data feature is only available for IMap, so you would need to use Spring caching with hazelcast-spring module and its com.hazelcast.spring.cache.HazelcastCacheManager:

    @Bean
    HazelcastCacheManager cacheManager(HazelcastInstance hazelcastInstance) {
        return new HazelcastCacheManager(hazelcastInstance);
    }

See documentation for more details.

František Hartman
  • 14,436
  • 2
  • 40
  • 60
  • Thank you very much. Your explanation is reasonable and makes sense. However, if I try enabling reading from back-ups then it's not supported by JCache since this configuration is relevant for Map data structure, if I'm not mistaken. – Anton Skripin Feb 22 '22 at 23:29
  • Right, you are correct. You can use IMap as the cache backing datastructure if you switch to spring `@Cacheable` annotations and add `hazelcast-spring` dependency. – František Hartman Feb 22 '22 at 23:48
  • is it possible, however, that serialization/deserialization takes much time. If I perform 5000 lookup over 7 caches where an average get time is 0,009 milliseconds then in total it takes 45 milliseconds. If there are 2 hazelcast instances forming a cluster then the response time for such a call takes 25500 milliseconds. However, if we consider only network traffic it should be: (0.5 * 5000 * 0.009) + (0.5 * 5000 * 0.3 (it's my localhost ping delay) ) = 772 milliseconds. The difference of 25500 - 772 = 24728 ms is the time needed for de/serializing. Isn't too slow? – Anton Skripin Feb 23 '22 at 11:57
  • You can try to profile it to see where it spends time, or share a full exaple code somewhere I would have a look when I have time. – František Hartman Feb 23 '22 at 15:00