2

I think that it is a general problem and it is not related with the using technologies. So, please consider the problem itself.

I am storing data in Couchbase like below format

productId is document id.

{
    "size",
    "colour",
    "category",
    "updatedDate"
}

I am listening a Kafka topic to getting partial update events.

Partial update events can be like in any combination:

{
    "size",
    "colour"
}

or

{
    "size"
}

or

{
    "category"
    "colour"
}

etc.

So, let's look at that below problematic case:

Let a document exist in Couchbase like below.

{
    "size" : "M",
    "colour" : "Black",
    "category" : "Sweat",
    "updatedDate" : "2022-11-11T12:12:12"
}

Lets an update event came at 2022-11-11T13:13:13 like below

{
    "category" : "Jean",
    "colour" : "Brown",
    "eventTimeStamp" : "2022-11-11T13:13:13"
}

and let's we could not write that message to Couchbase because the couchbase does not available at that moment. So we can not update our document. So we move this event message to a retry topic.

The event message is in retry topic and waiting for consuming.

At that time a new update come like below at 2022-11-11T14:14:14:

{
    "colour" : "Yellow"
    "eventTimeStamp" : "2022-11-11T14:14:14"
}

And that event written to couchbase successfully and last document status like this:

{
    "size" : "M",
    "colour" : "Yellow",
    "category" : "Sweat",
    "updatedDate" : "2022-11-11T14:14:14"
}

After that lets we consume the retry topic and consume the below message(which is not written to Couchbase at above)

{
    "category" : "Jean"
    "colour" : "Brown"
    "eventTimeStamp" : "2022-11-11T13:13:13"
}

When we consume this event,

as you can see the eventTimeStamp is before the updatedDate. So we should ignore this message. However if we ignore this message the category will stay as Sweat but it is old data. Since it should be Jean. If we write this data, the colour will be old. Since it should be Yellow not Brown.

Both of way(ignoring or writing message to Couchbase) are causing old data. Ignoring is causing old category. Writing it is causing old colour.

What should we do in that cases?

(You can say that store an updated date field for each field and compare that field for each field. I don't think that it is a best practice. So if there is a more good solution I want to follow it.)

javac
  • 441
  • 4
  • 20
  • How many updates of single document you have? – Paweł Szymczyk Nov 15 '22 at 16:50
  • It can be n times. There is no limit. But if you ask this question case, there was two update event and first one failed and second one success. Then the first one has been retried which is a problematic operation that I want to mention and ask. – javac Nov 15 '22 at 19:08

1 Answers1

1

I don't see how you could maintain the proper order after publishing to a "retry later" topic.

One solution is to use Kafka Connect and the Couchbase Sink connector.

The Couchbase Sink connector has a N1qlSinkHandler that merges fields into existing JSON documents (or creates documents if they are absent). Select this sink handler by setting the couchbase.sink.handler config property.

Then set the couchbase.retry.timeout property to a value higher than your expected outage time. When Couchbase is unavailable, the connector retries automatically, and your writes happen in the proper order.

dnault
  • 8,340
  • 1
  • 34
  • 53