I think that it is a general problem and it is not related with the using technologies. So, please consider the problem itself.
I am storing data in Couchbase like below format
productId is document id.
{
"size",
"colour",
"category",
"updatedDate"
}
I am listening a Kafka topic to getting partial update events.
Partial update events can be like in any combination:
{
"size",
"colour"
}
or
{
"size"
}
or
{
"category"
"colour"
}
etc.
So, let's look at that below problematic case:
Let a document exist in Couchbase like below.
{
"size" : "M",
"colour" : "Black",
"category" : "Sweat",
"updatedDate" : "2022-11-11T12:12:12"
}
Lets an update event came at 2022-11-11T13:13:13 like below
{
"category" : "Jean",
"colour" : "Brown",
"eventTimeStamp" : "2022-11-11T13:13:13"
}
and let's we could not write that message to Couchbase because the couchbase does not available at that moment. So we can not update our document. So we move this event message to a retry topic.
The event message is in retry topic and waiting for consuming.
At that time a new update come like below at 2022-11-11T14:14:14:
{
"colour" : "Yellow"
"eventTimeStamp" : "2022-11-11T14:14:14"
}
And that event written to couchbase successfully and last document status like this:
{
"size" : "M",
"colour" : "Yellow",
"category" : "Sweat",
"updatedDate" : "2022-11-11T14:14:14"
}
After that lets we consume the retry topic and consume the below message(which is not written to Couchbase at above)
{
"category" : "Jean"
"colour" : "Brown"
"eventTimeStamp" : "2022-11-11T13:13:13"
}
When we consume this event,
as you can see the eventTimeStamp is before the updatedDate. So we should ignore this message. However if we ignore this message the category will stay as Sweat but it is old data. Since it should be Jean. If we write this data, the colour will be old. Since it should be Yellow not Brown.
Both of way(ignoring or writing message to Couchbase) are causing old data. Ignoring is causing old category. Writing it is causing old colour.
What should we do in that cases?
(You can say that store an updated date field for each field and compare that field for each field. I don't think that it is a best practice. So if there is a more good solution I want to follow it.)