5

I have to read some data (which is coming at a blinding speed - upto 5000 messages per second) from a multicast (UDP) stream. Because the stream is multicast (and the data is quite critical) the data provider has provided two streams that send identical data (their logic being that the possibility of the same packet dropping in both streams is very close to zero). All data packets are tagged with a sequence number to keep track.

Also, the application is so time critical that I am forced to listen to both streams in parallel and pick up the next sequence number from whichever multicast stream it was received on first - When the same packet comes on the mirror stream, I simply drop it.

I am planning to implement this drop feature using a common "sequence_number" variable between the two functions - which by the way run in different threads. The sequence number is atomic as it is going to be read and updated from two different threads.

The obvious algorithm that comes to mind is

if (sequence number received from the stream > sequence_number)
{
   process packet;
   sequence_number = sequence number received from the stream;
}

(The above algorithm needs to be modified for times when sequence numbers come out of order - and they can as it is a UDP stream - but lets forget about it for the time being)

My question is this:

From the time I std::load my sequence_number, check if it is smaller than the sequence number I have received from the stream, accept the packet, and finally std::store the new sequence number to sequence_number; if the other stream receives the same packet (with the same sequence number) and performs the same operations (before the first stream finishes std::store on that sequence number), I will essentially end up with the same packet twice in my system. What is a way to overcome this situation ?

Chani
  • 5,055
  • 15
  • 57
  • 92
  • I think you need to look into using using [locking mutexes](http://en.wikipedia.org/wiki/Mutual_exclusion). This is exactly what they are designed to do. Automic variables simply ensure that no two threads will perform an operation on that variable at the same time, but that only applies to a single read/write. You are clearly doing more than a single read/write. You're doing a [transaction](http://en.wikipedia.org/wiki/Transaction_processing) – aruisdante Jul 19 '14 at 17:21
  • @aruisdante I have considered that. But you know, the cost of locking it can be quite a bit. I am essentially looking for a lock free solution here. – Chani Jul 19 '14 at 17:25
  • 2
    There are many, many ways to perform [lock-free programming](https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=lockless%20transaction%20c%2B%2B). What method is best for your application may vary. I would try the simple mutex solution first, and only if that *doesn't work*, try and implement a more complex lockless method. Generally speaking, lock-free programming is focused around preventing deadlock, not improving transaction performance. – aruisdante Jul 19 '14 at 17:26
  • 1
    What about avoiding dropping any package, receive all packages and just put it atomically in a proper sequence slot ? –  Jul 19 '14 at 17:33
  • @DieterLücking this is probably the better solution. If you have a thread-safe ``set``-like structure based on sequence number, you don't have to worry about dealing with detecting when to ignore packets at all, the data-structure will do it for you. – aruisdante Jul 19 '14 at 17:34
  • @DieterLücking Then I would be essentially running the risk of inserting duplicates in my sequence slot, right ? – Chani Jul 19 '14 at 17:35
  • 2
    @Wildling I assume these duplicates are identical, hence replacing one with the other is no harm. –  Jul 19 '14 at 17:39
  • @DieterLücking True, but that again removes the advantage of being able to process the earliest packet and moving on. However, as my question is already an edge case, this is definitely a way to go about it. Thanks ! – Chani Jul 19 '14 at 17:41
  • 1
    You can also use `compare_exchange` to avoid replacing a slot that isn't empty. – Ben Voigt Jul 19 '14 at 17:54
  • 1
    since the parallel work is light (check sequence), the exclusive work is heavy (handle packet), I would not do it multi-threaded but use select/epoll/kqueue on multiple sockets. – Non-maskable Interrupt Jul 19 '14 at 18:29
  • @Calvin Link to a select/epoll/kqueue tutorial please !!! :)) – Chani Jul 19 '14 at 18:36

3 Answers3

2

Don't put off worrying about handling out of order packets until later, because solving that also provides the most elegant solution to synchronizing threads.

Elements of an array are unique memory locations for the purposes of data races. If you put each packet (atomically via pointer write) into a different array element according to its sequence number, you'll get rid of most of the contention. Also use compare-exchange to detect whether the other thread (other stream) has already seen that packet.

Note that you won't have the retry loop normally associated with compare-exchange, either you have the first copy of the packet and compare-exchange succeeds, or the packet already exists and your copy can be discarded. So this approach is not only lock-free but also wait-free :)

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • using an array to store unique sequence numbers might not work as I am to receive up to eighteen million messages. However thumbs up for the compare-exchange idea. – Chani Jul 19 '14 at 18:26
  • Wilding, use just the low bits of the sequence number as the array index, which creates a circular buffer – Ben Voigt Jul 19 '14 at 18:27
  • Yeah, that would be great. Another thing, if you don't mind me asking; from cppreference, "std::atomic::compare_exchange_strong Atomically compares the value stored in *this with the value of expected, and if those are equal, replaces the former with desired (performs read-modify-write operation). Otherwise, loads the actual value stored in *this into expected (performs load operation)." I did not understand the "and if those are equal, replaces the former with desired (performs read-modify-write operation)" part. Why doesn't it leave it alone if the values are same ? – Chani Jul 19 '14 at 18:31
  • It succeeds when the current value in the atomic is the same as the "old" value, termed "expected". In this use case, use `nullptr` for the expected value. – Ben Voigt Jul 19 '14 at 18:38
1

Here is one option, if you are using std::atomic values, using compare_exchange.

Not shown is how to initialize last_processed_seqnum, as you'll need to set it to a valid value, namely, one less than the seqnum of the next packet to arrive.

It will need to be adapted for the case in which there are sequence number gaps. You mention as part of your premise that there will be no dropped seqnums; but the example below will stop processing packets (i.e. fail catastrophically) upon any seqnum gaps.

std::atomic<int> last_processed_seqnum;
// sync last_processed_seqnum to first message(s).

int seqnum_from_stream = ...;
int putative_last_processed_seqnum = seqnum_from_stream - 1;


if (last_processed_seqnum.compare_exchange_strong(putative_last_processed_seqnum,
                                                  seqnum_from_stream))
{
   // sequence number has been updated in compare_exchange_strong
   // process packet;
} 

Ideally, what we want is a compare_exchange function that uses greater than, not equals. I don't know of any way to achieve that behavior in one operation. The SO question I linked to links to an answer about iterating over all values less than a target to update.

Community
  • 1
  • 1
NicholasM
  • 4,557
  • 1
  • 20
  • 47
0

You are probably implementing a price feed handler, which exchange is it and what protocol? Is it ITCH or FIX Fast? I would not recommend two threads for the same feed since you probably have to join several multicast groups for different market segments/boards.

  • Data coming from a particular multicast stream is guaranteed to be sent for only a particular set of orderbooks. So I guess the problem of merging orderbooks does not arise. Was that what you were worried about ? (It is a binary protocol .. neither ITCH nor FIX) – Chani Jul 19 '14 at 20:26