0

It is mentioned in cosmos docs that

With a push model, the change feed processor pushes work to a client that has business logic for processing this work. However, the complexity in checking for work and storing state for the last processed work is handled within the change feed processor.

But when I see the lifecycle of change feed processor library, it is as follows-:


1. Read the change feed.
2. If there are no changes, sleep for a predefined amount of time (customizable with WithPollInterval in the Builder) and go to #1.
3. If there are changes, send them to the delegate.
4. When the delegate finishes processing the changes successfully, update the lease store with the latest processed point in time and go to #1.

Looking at this lifecycle, it seems like we are polling changes from change feed instead of it pushing changes to us.

Is my understanding correct that the next set of changes will be pulled from change feed once my current delegate thread has finished or not? Is yes, how is this a push model and not a pull model?

Thanks in advance.

1 Answers1

1

It is a push model to the user code (changes are pushed to the delegate, the user is not polling) but the underneath implementation is a poll and the flow is as described.

Remember the client library really just talks with the exposed service endpoints (REST APIs), there is no special push protocol (not gRPC for example), so clients can only do normal network requests (send request, get response).

Matias Quaranta
  • 13,907
  • 1
  • 22
  • 47
  • The confusion I have with the explanation is that in the docs, it is mentioned that in pull model, the client can control the pace of documents received. But I can do the same in push model as well by putting sleep in the delegate worker. My understanding is correct here right? – Arunim Chopra Nov 17 '22 at 16:10
  • The point of the Change Feed Processor is distribute Change Feed consumption across an scalable compute, with automatic distribution across dynamic compute instances. If you are going to put a sleep on a delegate, then why are you considering the Change Feed Processor in the first place? A sleep is not directly controlling the pace, because you are not directly controlling the other compute threads that are checking different partitions concurrently. – Matias Quaranta Nov 17 '22 at 17:38
  • If you want manual control, use pull model, if you want dynamic distribution across compute and a push programmatic model similar to notifications, use Change Feed Processor. The goal of CFP is to allow you to consume changes as fast as possible, adding a sleep in the middle would go against it. – Matias Quaranta Nov 17 '22 at 17:39
  • I will not put a sleep exactly. But I have a business logic due to which I might need to block a packet's processing based on some field. Basically, I do not want to manage the continuation tokens myself. So, I want to leverage the parallelism and management of LSN provided by push model but I want to consume on my pace. If I put a sleep in delegate, the client library will not request for more packets till the previous delegate is done right? I mean it will not spawn a new thread for next delegate or something like that? And it will not build up packets in memory for later consumption? – Arunim Chopra Nov 17 '22 at 17:57
  • The delegate execution is per lease, each lease gets its own concurrent Task. If you hold the Task with a delay, then yes, it will not continue but it won't checkpoint either, meaning that if the instance crashes while you are holding, the lease state would not be updated and you'd probably have to reprocess the last batch for that lease that was holding. This also does not hold the case of "No new changes" as that check does not trigger the delegate. – Matias Quaranta Nov 17 '22 at 19:45