0

I have a distributed platform which allows customers to make purchases, and the items which are purchased are stored in an inventory:

Sales app -> PurchaseEvent -> Inventory app

The Sales app raises the PurchaseEvent onto a message bus, which is asynchronously consumed by the Inventory app. This all works great.

There's one piece of functionality which makes it possible for two customers to be merged into one. When this happens, a CustomerMergedEvent is raised, and the Inventory app consumes this to update its data (so that all inventory for those two customers is now under one merged customer).

All is smooth when everything works fine. The challenge arrives when there is a performance backlog in PurchaseEvents being consumed. Any purchase consumed by Inventory after the CustomerMergedEvent has been consumed, will not know the customer merge has taken place. We'll also not even be alerted to the fact this has happened.

We could make it so every customer merge results in a new customer, and have the Inventory app alert us if it receives information about a customer which no longer exists. But are there solutions which solve this time-related issue with events on a higher level?

FBryant87
  • 4,273
  • 2
  • 44
  • 72

1 Answers1

1

Why can't your inventory service store the fact that Customer A has been merged into Customer B (by a CustomerMergedEvent)? Then all your purchase event processor has to do is check for a previous merge of the customer (potentially recursively: A could be merged into B which could merge into C and so on if there's enough lag) and use the "effective customer" for the purchase.

An alternative approach (if you can't for some reason record the fact of the merge in the inventory app to inform future processing) is to model a period where a merge is in progress and declare that period over when you're sufficiently sure that no more purchase events for the pre-merge customers will be coming. If the events have a time associated with them, watermarking might be sufficient. Alternatively, if your message bus is partitioned such that all events concerning a given customer are in the same partition (e.g. Kafka/Pulsar/Azure Event Hub), you can write the CustomerMergedEvent denoting that customer A merged into customer B twice: once to customer A's partition and once to customer B's partition (each time intended for their respective customer).

Levi Ramsey
  • 18,884
  • 1
  • 16
  • 30
  • These solutions are likely the most suitable for our case, the only downside to the first is that for every type of consumption which references a customer, we need to check for merged events and update if so. Quick check in the 2nd suggestion - can Event Hub etc guarantee ordering for a partition even if there are events from multiple topics on that partition? If we're able to order events for the same customer across multiple topics, most of this problem goes away (I know this is impossible with Service Bus) – FBryant87 Apr 11 '23 at 07:36
  • 1
    AFAIK in Event Hub, a partition only belongs to one topic. Consuming from multiple topics implies multiple partitions, and consumption order between partitons is nondeterministic. If you're talking about having multiple input topics getting combined into one firehose topic, then whatever's combining those topics and producing to the firehose is going to be responsible for imposing order in the output topic. – Levi Ramsey Apr 11 '23 at 14:29
  • Joining two streams while preserving a domain ordering is a fundamentally stateful process (spoiler alert: you'll almost certainly end up using a datastore that's not a "fancy/persistent queue" as an ersatz message queue...), so try to use approaches that embrace statefulness. – Levi Ramsey Apr 11 '23 at 14:34
  • 1
    Appreciate the responses. I'm actually considering combining both events into one Customer entity topic as a couple of articles suggest, as this makes it easy to consume any events relating to a Customer in the intended sequence without the racing complexity. But you may be correct about needing stateful solutions on the consumer side if that presents major issues – FBryant87 Apr 11 '23 at 15:24
  • 1
    Note that if customer entity events can come from asynchronous processes (meaning processes where there isn't a definite happens-before relation (such as one imposed by pessimistic concurrency control) between those processes: this allows one or both processes to be implemented internally in a blocking, synchronous way) the ordering can be non-deterministic, though in this scenario at least, the window of uncertainty can be a little more tractable. – Levi Ramsey Apr 11 '23 at 17:38