3

we have been working on kafka ecosystem. let me go through the flow

Source(SQLServer) -> Debezium(CDC) -> Kafka Broker -> Kafka Stream(Processing, joins etc) -> Mongo connector -> Mongo DB

Now we are in last step, we are inserting processed data into mongo dB but now we have requirement to upsert data instead just insert.

Can we get upsert(insert/update) functionality from mongo sink connector. as for now I understand it cant be done.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Owais Ajaz
  • 244
  • 5
  • 20

2 Answers2

2

Please follow the provided link , it has all the information about kafka mongo connector. i have successfully implemented upsert functionilty.you just need to read this document carefully.

Kafka Connector - Mongodb

Owais Ajaz
  • 244
  • 5
  • 20
2

Effectively this is an upsert, we want to insert if the ${uniqueFieldToUpdateOn} is not in mongo, or update if it exists as follows.

There are two main ways of modelling data changes in a collection depending on your usecase update/replace as outlined below:

UPDATE

The following config states:

  1. Update ${uniqueFieldToUpdateOn} with a field that is unique to that record that you want to model your update on.
  2. AllowList (whitelist) this field For use with the PartialValueStrategy allows custom value fields to be projected for the id strategy.
  3. UpdateOneBusinessKeyTimestampStrategy means that only the one document referenced by the unique field declared above will be updated (Latest timestamp wins).
"document.id.strategy":"com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy", 
"document.id.strategy.partial.value.projection.list":"${uniqueFieldToUpdateOn}",
"document.id.strategy.partial.value.projection.type":"AllowList",
"writemodel.strategy":"com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneBusinessKeyTimestampStrategy" 

REPLACE

NB this models a REPLACE not an update but may be useful none the less

The following config states:

  1. Replace ${uniqueFieldToUpdateOn} with a field that is unique to that record that you want to model your replace on.
  2. AllowList (whitelist) this field For use with the PartialValueStrategy allows custom value fields to be projected for the id strategy.
  3. ReplaceOneBusinessKeyStrategy means that only the one document referenced by the unique field declared above will be replaced.
"document.id.strategy":"com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy", 
"document.id.strategy.partial.value.projection.list":"${uniqueFieldToUpdateOn}",
"document.id.strategy.partial.value.projection.type":"AllowList",
"writemodel.strategy":"com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy"
  • i've used the same configurations suggested but I'm not able to succesfully make the connector run its failing im getting the error `org.apache.kafka.connect.errors.DataException: Could not convert key 456 into a BsonDocument.\n\tat com.mongodb.kafka.connect.sink.converter.LazyBsonDocument.getUnwrapped(LazyBsonDocument.java:157)\n\tat com.mongodb.kafka.connect.sink.converter.LazyBsonDocument.clone(LazyBsonDocument.java:146)\n\tat com.mongodb.kafka.connect.sink.converter.SinkDocument.clone(SinkDocument.java:45)\n\tat ` is there an example connector for this that i can use as reference – Harshith Yadav Nov 18 '20 at 04:58
  • can u show me your connector configuration and maybe I can help u? – LawrenceMouarkach Nov 18 '20 at 09:03
  • since i can't post the full connector in one go ill put it in parts first part is ** "name":"tag-update2","config":{"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector","tasks.max":"1","connection.uri":"mongodb://xx.xx.xx:27017","database":"staff","collection":"user","topics":"user_update_new1","key.converter":"org.apache.kafka.connect.storage.StringConverter","value.converter":"io.confluent.connect.json.JsonSchemaConverter" ** – Harshith Yadav Nov 18 '20 at 09:44
  • the other half is **"value.converter.schema.registry.url": "http://xx.xxx.xx.xx:8081", "document.id.strategy":"com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy", "document.id.strategy.partial.value.projection.type":"user_id", "document.id.strategy.partial.value.projection.type":"AllowList", "writemodel.strategy":"com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy" } }** apologies for not being able to send it as a whole – Harshith Yadav Nov 18 '20 at 09:49
  • first things first you have document.id.strategy.partial.value.projection.type twice, you should have document.id.strategy.partial.value.projection.list with "user_id", also can u show me the payload of the message you are sending including the key? looks like issue converting the key – LawrenceMouarkach Nov 18 '20 at 10:02
  • also you probably dont need schema registry config if you are not using avro, unless you intennd to be then u will need to change your converter to: io.confluent.connect.avro.AvroConverter – LawrenceMouarkach Nov 18 '20 at 10:12
  • I'm actually using the json schema seralizer for pythoh kafka to publish the data the key is serialiszed with StringSerializer function and yeah i updated the duplicate you pointed out but no luck – Harshith Yadav Nov 18 '20 at 10:51
  • i used this format to publish the data [https://github.com/confluentinc/confluent-kafka-python/blob/master/examples/json_producer.py] – Harshith Yadav Nov 18 '20 at 10:53
  • can u show me the key and value for the payload that is generated? Log them out somewhere and once u have those we can see why the payload is not being converted properly, I'd imagine its generating invalid json – LawrenceMouarkach Nov 18 '20 at 10:58
  • i used kafkacat to pull the data from the topic it looks like so `{"topic":"tag-update","partition":0,"offset":6,"tstype":"create","ts":1605589866512,"broker":-1,"key":"456","payload":"\u0000\u0000\u0000\u0000\u0006{\"client_id\": 1, \"default\": true}"}` – Harshith Yadav Nov 18 '20 at 11:40
  • 2 issues: 1. You are using a json converter for your key, yet u are passing a string as ur key which is not json. 2. looks like the payload is a combination of byte array and escaped json look at how u are generating your payload. Fix the message and it will all work – LawrenceMouarkach Nov 18 '20 at 12:55