0

I am trying to get my head around an issue I have recently encountered and I hope someone will be able to point me in the most reasonable direction of solving it.

I am using Riak KV store and working on CRDT data, where I have some sort of counter inside each CRDT item stored in database.

I have a rabbitmq queue, where each message is a request to increase or decrease a certain amount of aforementioned counters.

Finally, I have a group of service-workers, that listens on the queue, and for each request try to change the amount of counters accordingly.

The issue I have is as follows: While a single worker is processing a request, it may get stuck for a while on a write operation to database – let’s say on a second change of counters out of three. It’s connection with rabbitmq gets lost (timeout) so the message-request gets back on to the queue (I cannot afford to miss one). Then it is picked up by second worker, which begins all processing anew. However, the first worker finishes its work, and as a results I have processed a single message twice.

I can split those increments into single actions, but this still leaves me with dilemma – can still change value of counter twice, if some worker gets stuck on a write operation for a long period.

I do not have possibility of making Riak KV CRDT writes work faster, nor can I accept missing out a message-request. I need to implement some means of checking whether a request was already processed before. My initial thoughts were to use some alternative, quick KV store for storing rabbitMQ message ID if they are being processed. That way other workers could tell whether they are not starting to process a message that is already parsed elsewhere. I could use any help and pointers to materials I can read.

actionjezus6
  • 47
  • 1
  • 5

1 Answers1

1

You can't have "exactly one delivery" semantic. You can reduce double-sent messages or missed deliveries, so it's up to you to decide which misbehavior is the least inconvenient.

First of all are you sure it's the CRDTs that are too slow ? Are you using simple counters or counters inside maps ? In my experience they are quite fast, although slower than kv. You could try: - having simple CRDTs (no maps), and more CRDTs objects, to lower their stress( can you split the counters in two ?) - not using CRDTs but using good old sibling resolution on client side on simple key/values. - accumulate the count updates orders and apply them in batch, but then you're accepting an increase in latency so it's equivalent to increasing the timeout.

Can you provide some metrics? Like how long the updates take, what numbers you'd expect, if it's as slow when you have few updates or many updates, etc

dams
  • 399
  • 1
  • 5