6

I use event hubs processor host to receive and process the events from event hubs. For better performance, I call checkpoint every 3 minutes instead of every time when receiving the events:

public async Task ProcessEventAsync(context, messages)
{
 foreach (var eventData in messages)
 {
    // do something
 }

 if (checkpointStopWatth.Elapsed > TimeSpan.FromMinutes(3);
 {
     await context.CheckpointAsync();
 }
}

But the problem is, that there might be some events never being checkpoint if not new events sending to event hubs, as the ProcessEventAsync won't be invoked if no new messages.

Any suggestions to make sure all processed events being checkpoint, but still checkpoint every several mins?

Update: Per Sreeram's suggestion, I updated the code as below:

public async Task ProcessEventAsync(context, messages)
{
    foreach (var eventData in messages)
    {
     // do something    
    }

    this.lastProcessedEventsCount += messages.Count();

    if (this.checkpointStopWatth.Elapsed > TimeSpan.FromMinutes(3);
    {
        this.checkpointStopWatch.Restart();
        if (this.lastProcessedEventsCount > 0)
        {
            await context.CheckpointAsync();
            this.lastProcessedEventsCount = 0;
        }
    }
}
Youxu
  • 1,050
  • 1
  • 9
  • 34
  • Why not use a timer that ticks every 3 seconds so you do not have to depend on ProcessEventAsync? – Peter Bons Aug 15 '18 at 21:01
  • the checkpoint must be done after all events have been processed ( // do something in the example code). so timer seems not work. – Youxu Aug 15 '18 at 22:36

1 Answers1

6

Great case - you are covering!

You could experience loss of event checkpoints (and as a result event replay) in the below 2 cases:

  1. when you have sparse data flow (for ex: a batch of messages every 5 mins and your checkpoint interval is 3 mins) and EventProcessorHost instance closes for some reason - you could see 2 min of EventData - re-processing. To handle that case, Keep track of the lastProcessedEvent after completing IEventProcessor.onEvents/IEventProcessor.ProcessEventsAsync & checkpoint when you get notified on close - IEventProcessor.onClose/IEventProcessor.CloseAsync.

  2. There might just be a case when - there are no more events to a specific EventHubs partition. In this case, you would never see the last event being checkpointed - with your Checkpointing strategy. However, this is uncommon, when you have continuous flow of EventData and you are not sending to specific EventHubs partition (EventHubClient.send(EventData_Without_PartitionKey)). If you think - you could run into this situation, use the:

    EventProcessorOptions.setInvokeProcessorAfterReceiveTimeout(true); // in java or EventProcessorOptions.InvokeProcessorAfterReceiveTimeout = true; // in C#

flag to wake up the processEventsAsync every so often. Then, keep track of, LastProcessedEventData and LastCheckpointedEventData and make a judgement whether to checkpoint when no Events are received, based on EventData.SequenceNumber property on those events.

Sreeram Garlapati
  • 4,877
  • 17
  • 33
  • Thanks Sreeram! Do you know what is the default receiving time out value of EventProcessor? – Youxu Aug 15 '18 at 22:41
  • 60secs is the default. – Sreeram Garlapati Aug 16 '18 at 16:44
  • Thanks Streeram, I updated the code per your suggestion in original answer. Could you please take a look and comment? – Youxu Aug 16 '18 at 17:21
  • @SreeramGarlapati According to the java doc of method setInitialPositionProvider of class EventProcessorOptions this method determines the position at which to start receiving the events if there is no checkpoint. Does that mean if checkpoint is there this method will imply nothing while configuring the EventProcessorHost ? – Milesh Nov 20 '19 at 13:24