0

I have an application which is installed in 3 different servers.This application subscribes to a single Event hub. This event hub has 8 partitions. So when I start my application in all the 3 machines, all partitions are initialized randomly on all the 3 machines.

Say it is like this:

VM1 : Partition 0,1,2
VM2 : Partition 3,4
VM3 : Partition 5,6,7

All these partitions are receiving messages continuously. These messages needs to be processed one after the other. Now my requirement is , with in a machine/server, I want to receive only one message at a time (no matter how many partitions are initialized). Also VM1, VM2, VM3 can run in parallel.

A scenario would be, in one machine, say VM1, I have received a message through Partition 0. That message is being processed now which typically takes say 15 mins. With in these 15 mins, I do not want either Partition 1 or 2 to receive any new messages until the earlier one is finished. Once the previous message processing is done, then either of the 3 partitions is ready for new message. Once anyone partition receives another message, other partitions should not receive any messages.

The code I'm using is something like this :

public class SimpleEventProcessor : IEventProcessor
{
    public Task CloseAsync(PartitionContext context, CloseReason reason)
    {
       Console.WriteLine($"Processor Shutting Down. Partition '{context.PartitionId}', Reason: '{reason}'.");
       return Task.CompletedTask;
    }

    public Task OpenAsync(PartitionContext context)
    {
       Console.WriteLine($"SimpleEventProcessor initialized. Partition: '{context.PartitionId}'");
       return Task.CompletedTask;
     }

    public Task ProcessErrorAsync(PartitionContext context, Exception error)
    {
       Console.WriteLine($"Error on Partition: {context.PartitionId}, Error: {error.Message}");
       return Task.CompletedTask;
    }

    public Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
    {
       foreach (var eventData in messages)
       {
          var data = Encoding.UTF8.GetString(eventData.Body.Array, eventData.Body.Offset, eventData.Body.Count);
          Console.WriteLine($"Message received. Partition: '{context.PartitionId}', Data: '{data}'");
          DoSomethingWithMessage(); // typically takes 15-20 mins to finish this method.
       }
       return context.CheckpointAsync();
    }
} 

Is this possible?

PS: I have to use event hubs and have no other option.

halfer
  • 19,824
  • 17
  • 99
  • 186
CrazyCoder
  • 2,194
  • 10
  • 44
  • 91

1 Answers1

0

You can achieve this by mutual exclusion on a static lock object.

    public Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
    {
        lock (lockObj)
        {
            foreach (var eventData in messages)
            {
                var data = Encoding.UTF8.GetString(eventData.Body.Array, eventData.Body.Offset, eventData.Body.Count);
                Console.WriteLine($"Message received. Partition: '{context.PartitionId}', Data: '{data}'");
                DoSomethingWithMessage(); // typically takes 15-20 mins to finish this method.
            }

            return context.CheckpointAsync();
        }
    }

Don't forget to set EventProcessorOptions.MaxBatchSize to 1 as below.

var epo = new EventProcessorOptions
{
    MaxBatchSize = 1
};

await eventProcessorHost.RegisterEventProcessorAsync<MyProcessorHost>(epo);

Full processor code with downstream agent.

public class SampleEventProcessor : IEventProcessor
{
    public Task OpenAsync(PartitionContext context)
    {
        Console.WriteLine($"Opened partition {context.PartitionId}");
        return Task.FromResult<object>(null);
    }

    public Task CloseAsync(PartitionContext context, CloseReason reason)
    {
        Console.WriteLine($"Closed partition {context.PartitionId}");
        return Task.FromResult<object>(null);
    }

    public async Task ProcessEventsAsync(PartitionContext context, IEnumerable<EventData> messages)
    {
        foreach (var eventData in messages)
        {
            // Process the mesasage in downstream agent.
            await DownstreamAgent.ProcessEventAsync(eventData);

            // Checkpoint current position.
            await context.CheckpointAsync();
        }
    }

    public Task ProcessErrorAsync(PartitionContext context, Exception error)
    {
        Console.WriteLine($"Partition {context.PartitionId} - {error.Message}");
        return Task.CompletedTask;
    }
}

public class DownstreamAgent
{
    const int DegreeOfParallelism = 1;

    static SemaphoreSlim threadSemaphore = new SemaphoreSlim(DegreeOfParallelism, DegreeOfParallelism);

    public static async Task ProcessEventAsync(EventData message)
    {
        // Wait for a spot so this message can get processed.
        await threadSemaphore.WaitAsync();

        try
        {
            // Process the message here
            var data = Encoding.UTF8.GetString(message.Body.Array);
            Console.WriteLine(data);
        }
        finally
        {
            // Release the semaphore here so that next message waiting can be processed.
            threadSemaphore.Release();
        }
    }
}
Serkant Karaca
  • 1,896
  • 9
  • 8
  • Thanks for the reply. I will try and test the method which you have provided. I just wanted to confirm that it should not stop other VMs to receive messages. This blocking should happen only with in the VM. – CrazyCoder Feb 07 '20 at 05:28
  • Where do I have to set EventProcessorOptions.MaxBatchSize to 1 ? – CrazyCoder Feb 07 '20 at 06:25
  • Thanks again. But what is lockObj ? – CrazyCoder Feb 07 '20 at 06:41
  • Not sure this is gonna work. You cannot guarantee a given vm will always process the same partitions. A lock like this will be bound to a specific process. If another vm starts processing a partition previously processed by another machine, eg when the lease expires, this method fails. – Peter Bons Feb 07 '20 at 07:46
  • Also, given the example there will be three instances of an eventprocessor, so the lock must be shared. But you will need a distributed lock for the processes on the other machine. Might as well put a single queue after it. – Peter Bons Feb 07 '20 at 08:13
  • @PeterBons , I do not want to share the same lock across all the machines. With in a machine, I want them to receive message one by one. – CrazyCoder Feb 07 '20 at 08:53
  • Either make the lock static or create a downstream event handler where it has a critical code section which doesn't allow multi-threading. I believe latter is a cleaner solution than my code snippet in the answer since it abstract the singleton behavior from processors. – Serkant Karaca Feb 07 '20 at 16:45
  • @SerkantKaraca Could you please show me any examples – CrazyCoder Feb 09 '20 at 11:27