3

Usecase

Here is the topology we are working on

Server - 1 --> marketData cache, holds different share prices information

Client - 1 --> Pushing data on the cache

Client - 2 --> Continuous query, listening updates coming on marketData cache per key basis

I want data to follow the order in which it was received and pushed on queue. Reason is that client - 2 should not get old data. For example last price data for an instrument moved from 100 to 101 then 102. Client - 2 should get in the same order and to not get in order like 100 to 102 then 101. So for a key in my cache i want the messages to be pushed in order.

Ways of pushing data on cache:

  1. put seems to be safest way but looks like slow and full cache update is happening and then thread moves to next statement. This might not be suitable for pushing 20000 updates per second.

  2. putAsync seems to be good way, my understanding is that it uses stripped pool to push data on the cache. As the stripped pool uses max(8, no of cores) so it should be faster. But as the multiple threads are being processed in parallel so does it confirm the data ordering as it was pushed in? https://apacheignite.readme.io/docs/thread-pools#section-striped-pool

  3. DataStreamer seems to be best way as it also processes things in parallel but again the problem is with the order of putting data into cache. API documentation also mention that ordering of data is not guaranteed. https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Datastream.IDataStreamer-2.html

Can someone please clarify above? I could not find document giving clear deeper understanding for these ways. What is best way out of above to push continuously updating data like market data?

Magnum23
  • 407
  • 1
  • 4
  • 10

1 Answers1

1

It depends on the structure of your data. I'm going to assume that the key is the symbol (or symbol and venue), specifically time is not a factor.

Your three options are actually two. The first two are actually identical. The only difference is whether it waits for acknowledgement or returns a future. Option two might be slightly more resource-intensive but it's not going to be faster.

The idea of the striped thread pool is that the same thread is used to update the same key, so ordering should be maintained as long as you have one queue per node. And you can do that by setting perNodeParallelOperations() to 1.

You might also want to experiment with the size of the striped thread pool.

Stephen Darlington
  • 51,577
  • 12
  • 107
  • 152
  • Thanks Stephens.. Yes the key is the symbol. i hope perNodeParallelOperations you mentioning here is for IgniteDataStreamer. If setting it to one, would not affect processing updates for different keys in parallel, or it is per key basis? We want atleast 8(number of cores) keys are being updated in parallel. also i am using allowOverwrite to true to update the value every time with latest value. – Magnum23 Jan 17 '20 at 06:55
  • 1
    Yes -- I should have been clearer -- that's a parameter for the IgniteDataStreamer. The overhead of maintaining the order is going to have some impact on throughput but keys on different striped threads should still update in parallel. – Stephen Darlington Jan 17 '20 at 09:37
  • Can you please confirm please if putAsync also uses striped pool? i could not find relevant confirmation document, if you have can you please share the link. – Magnum23 Jan 22 '20 at 03:57
  • @StephenDarlington in this case perNodeParallelOperations() to 1. how delayed data is handeled – Vishal Jan 22 '20 at 07:15
  • @Trans Pretty much all reads/writes use the striped pool. That probably should be documented somewhere but I'm not sure where. – Stephen Darlington Jan 22 '20 at 09:26
  • @Vishal It's all based on the order the data arrives in the client application rather than a timestamp. – Stephen Darlington Jan 22 '20 at 09:27