Improving Get operation performance

Question

I am running some comparison tests (ignite vs cassandra) to check how to improve the performance of 'get' operation. The data is fairly straightforward. A simple Employee Object(10 odd fields), being stored as BinaryObject in the cache as

IgniteCache<String, BinaryObject> empCache;

The cache is configured with, Write Sync Mode - FULL_SYNC, Atomicity - TRANSACTIONAL, Backup - 1 & Persistence - Enabled

Cluster config, 3 server + 1 client node.

Client has multiple threads(configurable) making concurrent get calls.

For about 500k request, i am getting a throughput of about 1500/sec. Given all of the data is in off-heap with cache hits percentage = 100%. Interestingly with Cassandra i am getting a similar performance, with key Cache and limited row cache.

Letting the defaults for most of the Data configuration. For this test i turned the persistence off. Ideally for get's it shouldn't really matter. The performance is the same.

Data Regions Configured:
[19:35:58]   ^-- default [initSize=256.0 MiB, maxSize=14.1 GiB, persistence=false]

Topology snapshot [ver=4, locNode=038f99b3, servers=3, clients=1, state=ACTIVE, CPUs=40, offheap=42.0GB, heap=63.0GB]

Frankly, i was expecting Ignite gets to be pretty fast, given all data is in cache. Atleast looking at this test https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing

Planning to run one more test tomorrow with no-persistence and setting near cache (on heap) to see if it helps.

Let me know if you guys see any obvious configurations that should be set.

Have you tried having more threads & querying in parallel? Have you tried getAll? :) — alamar, Nov 26 '19 at 08:09
Yeah, i've tried with 10,20,30 threads, with marginal changes. My use case is a single 'get', so haven't used getAll on purpose. — Victor, Nov 26 '19 at 19:36
Also the benchmark test done seems to have a single 'get' too. So i am now wondering, if those numbers are real or in a more controlled environment with additional configurations. — Victor, Nov 26 '19 at 19:43
How much data is in the cluster and if any caching technique is enabled for Cassandra? If the latter caches all the data in RAM then performance of simple gets might be comparable to a certain extent. — dmagda, Nov 27 '19 at 03:22
The data and volume is exactly the same I.e same employee object with 500k records. Already mentioned in my description, using key cache ‘All’ and limited record cache, about 10k. My primary concern is why do I don’t see the performance anywhere remotely close to the published benchmark numbers, specifically for ‘get’. — Victor, Nov 27 '19 at 06:47
What's your CPU usage? Please note that benchmarks are usually heavily tuned so you should not expect to see the same number on a random scenario. — alamar, Nov 27 '19 at 09:25
The benchmark page does not mention about tuning details. Anyway, whatever configs are mentioned, i've run with the same configs. The benchmark uses 'gets' too. I am using fairly powerful boxes (Intel(R) Xeon(R) CPU X5675 @ 3.07GHz) with about 40/60 CPU's and about 75GB Ram. During the run, top shows, the CPU usage for sever nodes to be roughly between 8-10% each, and the client using about 130% (so little over 1 cpu). So lot of room. Plus the Employee data i am adding is barely 50-75 bytes, so 500k is about 40-50mb+1 backup, so additional 40-50mb, in total about 100mb. So not much data. — Victor, Nov 28 '19 at 02:23

Improving Get operation performance

0 Answers0