Pub/Sub Lite Delayed Consumer

Question

I am implementing Kafka delayed topic consumption with consumer.pause(<partitions>).

Pub/Sub Kafka shim turns pause into a NoOp:

https://github.com/googleapis/java-pubsublite-kafka/blob/v0.6.7/src/main/java/com/google/cloud/pubsublite/kafka/PubsubLiteConsumer.java#L590-L600

Is there any documentation on how to delay consumption of a pub sub lite topic by a set duration?

i.e. I want to consume all messages from a Pub/Sub Lite topic but with a synthetic 4 minute lag.

Here is my algorithm with Kafka native:

call consumer.poll()
resume all assigned partitions consumer.resume(consumer.assignment())
combine previously delayed records with recently polled records
separate records into
- records that are old enough to process
- records still too young to process
pause partitions for any records that are too young consumer.pause(<partitions of too young>)
keep a buffer of too young records to reconsider on the next pass, called delayed
processes records that are old enough
rinse, repeat

We only commit offsets of records that are old enough, if the process dies any records in the “too young” buffer will remain uncommitted and they will be revisited by whichever consumer receives the partition in the ensuing rebalance.

Is there a more generalized form of this algorithm that will work with native Kafka and Pub/Sub Lite?

Edit: CloudTasks is a bad idea here as it disconnects the offset commit chain. I need to ensure I only commit offsets for records that have gotten an ack from the downstream system.

score -1 · Answer 1 · answered Nov 18 '21 at 16:11

Something similar to the above would likely work fine if you remove the pause and resume stages. I'd note that with both systems, you are not guaranteed to receive all messages that exist on the server until now in any given poll() call, so you may add extra delay if you are not given any records for a given partition in a poll call.

If you do the following with autocommit enabled, you should effectively delay processing by strictly more than 4 minutes.

call consumer.poll()
sleep until every record 4 minutes old
process records
go to 1.

If you use manual commits, you can make the sleeps closer to 4 minutes on a per-message basis, but with the downside of needing to manage offsets manually:

call consumer.poll()
put records into ordered per-partition buffers
sleep until the oldest record for any partition is 4 minutes in the past
process records which are more than 4 minutes in the past
commit offsets for processed records
go to 1

Sleep is not an option for me. Only call that is allowed to block is the call to poll itself. This will only block if there are no new messages. We already manage offsets ourselves. — Gabriel, Nov 20 '21 at 02:07
"Only call that is allowed to block is the call to poll itself." This seems to be an arbitrary constraint; given that you are already blocking the thread in the poll() call, and you should reconsider it. An alternative assuming that you have no control over this constraint is to calculate the polling time as the amount of time that would be required to make the oldest record is 4 minutes in the past. This would accumulate new messages or time out when the oldest record is 4 minutes in the past. This has memory overhead though compared to the sleep approach. — Daniel Collins, Nov 22 '21 at 02:15

Pub/Sub Lite Delayed Consumer

1 Answers1