0

I have a Reliable Dictionary partitioned across a cluster of 7 nodes. [60 partitions]. I've setup remoting listener like this:

var settings = new FabricTransportRemotingListenerSettings
        {
            MaxMessageSize = Common.ServiceFabricGlobalConstants.MaxMessageSize,
            MaxConcurrentCalls = 200
        };

        return new[]
        {
            new ServiceReplicaListener((c) => new FabricTransportServiceRemotingListener(c, this, settings))
        };

I am trying to do a load test to prove Reliable Dictionary "read" performance will not decrease under load. I have a "read" from dictionary method like this:

using (ITransaction tx = this.StateManager.CreateTransaction())
        {
            IAsyncEnumerable<KeyValuePair<PriceKey, Price>> items;
            IAsyncEnumerator<KeyValuePair<PriceKey, Price>> e;

            items = await priceDictionary.CreateEnumerableAsync(tx,
                (item) => item.Id == id, EnumerationMode.Unordered);                
            e = items.GetAsyncEnumerator();

            while (await e.MoveNextAsync(CancellationToken.None))
            {
                var p = new Price(
                    e.Current.Key.Id,
                    e.Current.Key.Version, e.Current.Key.Id, e.Current.Key.Date,
                    e.Current.Value.Source, e.Current.Value.Price, e.Current.Value.Type,
                    e.Current.Value.Status);

                intermediatePrice.TryAdd(new PriceKey(e.Current.Key.Id, e.Current.Key.Version, id, e.Current.Key.Date), p);
            }
        }
return intermediatePrice;

Each partition has around 500,000 records. Each "key" in dictionary is around 200 bytes and "Value" is around 600 bytes. When I call this "read" directly from a browser [calling the REST API which in turn calls the stateful service], it takes 200 milliseconds. If I run this via a load test with, let's say, 16 parallel threads hitting the same partition and same record, it takes around 600 milliseconds on average per call. If I increase the load test parallel thread count to 24 or 30, it takes around 1 second for each call. My question is, can a Service Fabric Reliable Dictionary handle parallel "read" operations, just like SQL Server can handle parallel concurrent reads, without affecting throughput?

Super Jade
  • 5,609
  • 7
  • 39
  • 61
teeboy
  • 408
  • 3
  • 13

2 Answers2

0

Based on the code I see all you reads are executed on primary replicas - therefore you have 7 nodes and 60 service instances that process requests. If I get everything right there are 60 replicas that process requests.

You have 7 nodes and 60 replicas - therefore if we imagine they are distributed more or less equally between nodes we have 8 replicas per node.

I am not sure about physical configuration of each node but if we assume for a moment that each node has 4 vCPU then you can imagine that when you make 8 concurrent requests on the same node all of these requests now should be executed using 4 vCPU. This situation causes worker threads to fight for resources - keeping it simple it significantly slows down the processing.

The reason why this effect is so visible here is that because you are scanning the IReliableDictionary instead of getting items by key using TryGetValueAsync like it supposed to be.

You can try to change you code to use TryGetValueAsync and the difference will be very noticeable.

Oleg Karasik
  • 959
  • 6
  • 17
  • I mentioned in my question that all the "reads" are going to the *same* partition and therefore the same node. In fact, I am requesting the same record every time. I want it this way to make sure that this single partition can handle concurrent requests. I also mentioned that if I do this query manually from a browser, it takes 200 milliseconds. TryGetValueAsync will not work in my case as I need to filter the key based on a predicate. You can see that my "Key" is a C# object that implements IComparable and IEquatable – teeboy Aug 01 '18 at 12:13
  • @teeboy What is the hardware capabilities of the node? How many parallel requests you do through browser? – Oleg Karasik Aug 01 '18 at 13:16
  • 8 cores, 16 GB RAM. from the browser just 1 request. – teeboy Aug 01 '18 at 15:23
  • @teeboy That is exactly the thing I was talking about. When you make only one request from the browser there is no concurrent processing involved on a node - you request is the only thing it is busy with. When you make more that one request these requests no concurrently use HDD (SSD), Memory and CPU that is why each of them become slower. To try it yourself you can modify you load test to do 2, 4, 8 and then 16 and 32 parallel requests. There shouldn't be very significant different when you have < 8 parallel requests. – Oleg Karasik Aug 02 '18 at 06:51
  • Ofcourse, I get that. The point is, service fabric is advertised as a low latency platform for reads and writes. You don't talk about low latency without parallel processing. Even with 16 parallel threads from the client loadtest, the respond times goes to 600 milliseconds from 200 milliseconds. – teeboy Aug 02 '18 at 10:41
  • @teeboy I may be wrong because only profiling can show the real reason. But my understanding of the situation is that when you do many parallel requests each of them is processed slower not because of latency to access reliable state but rather because it requires CPU + Memory to perform deserialization, comparison etc. side by side with 15 more requests who also consume CPU + Memory i.e. running two calculation threads on the same core is slower that running one calculation thread on one core. – Oleg Karasik Aug 02 '18 at 14:00
0

If you check the Remarks about Reliable Dictionary CreateEnumerableAsync Method, you can see that it was designed to work concurrently, so concurrency is not an issue.

The returned enumerator is safe to use concurrently with reads and writes to the Reliable Dictionary. It represents a snapshot consistent view

The problem is that concurrently does not mean fast

When you make your query this way, it will:

  1. have to take the snapshot of the collection before it start processing it, otherwise you wouldn't be able to write to it while processing.
  2. you have to navigate through all the values in the collection to find the item you are looking for and take note of these values before you return anything.
  3. Load the data from the disk if not in memory yet, only the Keys is kept in the memory, the values are kept in the disk when not required and might get paged for memory release.
  4. The following queries will probably(i am not sure, but I assume) not reuse the previous one, your collection might have changed since last query.

When you have a huge number of queries running this ways, many factors will take in place:

  • Disk: loading the data to memory,
  • CPU: Comparing the values and scheduling threads
  • Memory: storing the snapshot to be processed

The best way to work with Reliable Dictionary is retrieving these values by Keys, because it knows exactly where the data for a specific key is stored, and does not add this extra overhead to find it.

If you really want to use it this way, I would recommend you design it like an Index Table where you store the data indexed by id in one Dictionary, and another dictionary with the key being the searched value, and value being the key to the main dicitonary. This would be much faster.

Diego Mendes
  • 10,631
  • 2
  • 32
  • 36
  • Having dual dictionaries defeats the purpose as I now have to do 1 query to get all the keys that match my predicate and another to do TryGetAsync on the retrieved values in a loop and am sure the performance will be slower than retrieving in a single query. I am considering having the dictionary in memory as a "concurrentdictionary" and use reliable dictionaey notifications to add/update/delete the in-memory concurrentdictionary when the underlying dictionary is updated. Hopefully, in-memory parallel queries to concurrentdicitonaries will be faster than async queries to ReliableDictionaries. – teeboy Aug 01 '18 at 14:28
  • key lookup are an in-memory operation, does not require you to load the entire set to find an item, and once you get the key to the item you just do one more look up on another dictionary. You would be surprised about the difference. – Diego Mendes Aug 01 '18 at 14:31
  • a concurrentdictionary will be faster for sure, because the entire set is in memory, and also avoid the overhead of reliable collections. – Diego Mendes Aug 01 '18 at 14:35
  • I went with "reliable dictionary" notifications and building in-memory secondary indexes with key based lookups of the dictionary rather than enumerating using "createenumerableasync". This alllowed the latency to go away as "key" based lookups are much faster. The downside is, everytime the partition is re-built, the in-memory indexes have to be rebuilt and we need LOTS of RAM. – teeboy Oct 04 '18 at 18:23