I would like to ask you some questions about apache kafka and compacted topics. We want to provide some PII Data over a kafka compacted topic. We want to delete the data on this topic via tombstone. There are currently multiple questions where we want to verify our assumptions:
- Is there an other company which fulfills the gdpr requirement (right to forget) in kafka through a compacted topic with tombstone generation like the KIP-354 proposes https://cwiki.apache.org/confluence/display/KAFKA/KIP-354%3A+Add+a+Maximum+Log+Compaction+Lag?
- Is our assumption right, that the compaction is only triggered if the record is not in the active segment file. So in our point of view the kafka documentation needs to be modified by adding this to the kafka documentation point 4.8: The topic's max.compaction.lag.ms can be used to guarantee the maximum delay between the time a message is written and the time the message becomes eligible for compaction. Here it should add the condition, that the message we want to compact should not be in an active segment file. Is this a bug of the max.compaction.lag.ms feature or is it as designed? We are not sure at this point.
- Is the compaction only triggered after a new message is inserted? Or is there also an asynchronous process which compacts non active segment files?
Thanks for your answers ;-)