1

I want to tail mongo oplog and stream it through Kafka.But there are many databases and collections, and I just want to get the update data for one of them. If you want to filter out the desired operation records from all the operation records in oplog, this can affect performance. So I would like to ask for a better solution. Please give me some suggestions.

DotWait
  • 13
  • 2

1 Answers1

0

It's not clear what tool you are using, but Debezium supports these for applying filterings

  • database.whitelist
  • collection.whitelist

Also not clear what will "affect performance" since you are already reading the full oplog, but performing a filter (meaning dropping all records that don't match a condition) should not have significant impact as boolean/regex checks usually finish very quickly.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • The reason of worrying about performance is mongo stores lots of data,other databases will have many operations to update data.If you use code to get operation records in oplog,every operation record needs to be judged by code.Too many useless judgments can affect performance.What do you think? – DotWait Oct 19 '18 at 07:54
  • Performance of Mongo? No, because the oplog isn't filtered otherwise, and reading it in full cannot be avoided, AFAIK. And you're not performing actual database lookups or writes against Mongo, so that also wouldn't have impact... Basically, if using Kafka causes performance issues, then so would just having a replicated or sharded Mongo instance – OneCricketeer Oct 19 '18 at 13:53