3

I want to store the checkpoints from Azure event hub to a google cloud platform bucket, owing to a use case, but I am unable to find a way to do this.

As per my research on eventhub checkpointing, I see that a checkpoint_store object is created which has a dependency on Azure blob storage. The code has been shared below-

import asyncio
from azure.eventhub.aio import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblobaio import BlobCheckpointStore


async def on_event(partition_context, event):
    # Print the event data.
    print("Received the event: \"{}\" from the partition with ID: \"{}\"".format(event.body_as_str(encoding='UTF-8'), partition_context.partition_id))

    # Update the checkpoint so that the program doesn't read the events
    # that it has already read when you run it next time.
    await partition_context.update_checkpoint(event)

async def main():
    # Create an Azure blob checkpoint store to store the checkpoints.
    checkpoint_store = BlobCheckpointStore.from_connection_string("AZURE STORAGE CONNECTION STRING", "BLOB CONTAINER NAME")

    # Create a consumer client for the event hub.
    client = EventHubConsumerClient.from_connection_string("EVENT HUBS NAMESPACE CONNECTION STRING", consumer_group="$Default", eventhub_name="EVENT HUB NAME", checkpoint_store=checkpoint_store)
    async with client:
        # Call the receive method. Read from the beginning of the partition (starting_position: "-1")
        await client.receive(on_event=on_event,  starting_position="-1")

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    # Run the main method.
    loop.run_until_complete(main())

Problem- How do I change this method to be able to store the checkpoint to a google cloud bucket, so that my eventhub client can read from this checkpoint when a failure occurs?

Reference Links-

  1. Receive events
  2. Checkpoint store python client
john mich
  • 2,477
  • 3
  • 17
  • 32

1 Answers1

2

The consumer client has no affinity to a particular data store or Azure; it works though the provided checkpoint store abstraction to perform any operation requiring storage use. To use the processor with Google cloud bucket, you'd have to implement a custom checkpoint store and pass that to your consumer.

The SDK provides an abstract CheckpointStore type that defines the interface expected by the consumer. There is also an in_memory_checkpoint_store implementation that may help as a simplified example to get you started.

Jesse Squire
  • 6,107
  • 1
  • 27
  • 30