0

I am using eventhub for ingesting a lot of events. I have multiple consumers which are running behing a scaling group reading these events from the eventhub which has multiple partitions. I was going through the Azure SDK in python and was confused as to what to use. There is eventhubconsumerclient, eventprocessorHost ....

I would like to use a library where my multiple consumer can connect using the consumer group, the partitions are assigned dynamically to each consumer and checkpointing is made in the storage account, just like how I used kafka.

Nipun
  • 4,119
  • 5
  • 47
  • 83
  • 1
    I used the sample code and this is the error I am getting "An exception (KeyError('offset')) occurred during balancing and claiming ownership for eventhub" – Nipun Dec 05 '19 at 06:05
  • this is the link I am refering to : https://pypi.org/project/azure-eventhub/5.0.0b6/#consume-events-and-save-checkpoints-using-a-checkpoint-store – Nipun Dec 05 '19 at 06:14
  • Do you consider using "event processor host" in python? which uses consumer group and can set checkpointing. – Ivan Glasenberg Dec 05 '19 at 06:37
  • In the later versions of python sdk, I can see eventhubconsumerclient as being used. See the link in the comment. Also have a look at error. I am trying to run the same example program provided in the link – Nipun Dec 05 '19 at 06:42
  • It's a pre-release version, not stable version. Not sure if it has some potential bugs. – Ivan Glasenberg Dec 05 '19 at 06:44
  • so which one shall I use... Is there a stable version for the same? – Nipun Dec 05 '19 at 06:44
  • We should keep using the stable one. But the stable one uses "event process host". The pre-release version is not recommended for production usage. – Ivan Glasenberg Dec 05 '19 at 06:46
  • but if you prefer to use the pre-release version, I will take a try. Can you post the code you tried? – Ivan Glasenberg Dec 05 '19 at 06:49
  • It is there in the link : https://pypi.org/project/azure-eventhub/5.0.0b6/#consume-events-and-save-checkpoints-using-a-checkpoint-store – Nipun Dec 05 '19 at 06:53
  • can you please try and let me know. I am using the exact same code with only config changes – Nipun Dec 05 '19 at 06:55
  • yeah, I'll take a try and let you know the result. Just for set checkpoint and use consumer group, right? – Ivan Glasenberg Dec 05 '19 at 07:00
  • yes $default is the consumer group and for checkpoint I used storage account – Nipun Dec 05 '19 at 07:27

2 Answers2

2

Update:

For production usage, I suggest you should use the stable version of event hub sdk. You can use eph, sample code is here.


I can use the pre-release eventhub 5.0.0b6 to use consumer group as well as set checkpoint.

But the strange thing is that, in blob storage, I can see 2 folders created for the eventhub: checkpoint and ownership folder. Inside the folders, there're blob created for the partitions, but blob is empty. More stranger thing is that, even the blob is empty, every time I read from eventhub, it always read the latest data(means that it never reads the data has been read already in the same consumer group).

You need to install azure-eventhub 5.0.0b6 and use pip install --pre azure-eventhub-checkpointstoreblob to install azure-eventhub-checkpointstoreblob. For blob storage, you should install the latest version 12.1.0 of azure-storage-blob.

I follow this sample. In this sample, it uses event hub level connection string(NOT event hub namespace level connection string). You need to create an event hub level connection string by nav to azure portal -> your eventhub namespace -> your event hub instance -> Shared access policies -> click "Add" -> then specify a policy name, and select permission. If you just want to receive data, you can only select the Listen permission. The screenshot as below:

enter image description here

After the policy created, you can copy the connection string as per screenshot below:

enter image description here

Then you can follow this code below:

import os
from azure.eventhub import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblob import BlobCheckpointStore

CONNECTION_STR = 'Endpoint=sb://ivanehubns.servicebus.windows.net/;SharedAccessKeyName=saspolicy;SharedAccessKey=xxx;EntityPath=myeventhub'
STORAGE_CONNECTION_STR = 'DefaultEndpointsProtocol=https;AccountName=xx;AccountKey=xxx;EndpointSuffix=core.windows.net'


def on_event(partition_context, event):
    # do something with event
    print(event)
    print('on event')
    partition_context.update_checkpoint(event)


if __name__ == '__main__':

    #the "a22" is the blob container name
    checkpoint_store = BlobCheckpointStore.from_connection_string(STORAGE_CONNECTION_STR, "a22")

    #the "$default" is the consumer group
    client = EventHubConsumerClient.from_connection_string(
        CONNECTION_STR, "$default", checkpoint_store=checkpoint_store)

    try:
        print('ok')
        client.receive(on_event)
    except KeyboardInterrupt:
        client.close()

The test result:

enter image description here

Ivan Glasenberg
  • 29,865
  • 2
  • 44
  • 60
  • Thank you, I tried this one as well and I am keep getting this error : An exception occurred during list_ownership for namespace 'xxxxxxxx' eventhub 'xxxxxxx' consumer group '$default'. Exception is KeyError('ownerid') – Nipun Dec 05 '19 at 09:31
  • @Nipun, can you follow my steps, and try to use a new blob container? – Ivan Glasenberg Dec 05 '19 at 09:34
  • and if the error still occurs, please post your code, and all the packages(including version) you're using. So I can debug it to find the cause:). – Ivan Glasenberg Dec 05 '19 at 09:36
  • after going through some document I see that the partitionmanager needs to have the ownership of the partitions and I guess it is trying to get that from the storage blob which is empty – Nipun Dec 05 '19 at 09:48
  • @Nipun, maybe. But not sure why the blobs are empty, but can work well. And since it's pre-release, maybe it will be fixed in a few days. – Ivan Glasenberg Dec 05 '19 at 09:50
  • it creates a ownership folder having 2 partition 0 and 1 files but they are blank. I guess because of which it is not able to find the 'ownerid'. I deleted everything and retried and still the same issue – Nipun Dec 05 '19 at 09:52
  • a very nice article : https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-event-processor-host – Nipun Dec 05 '19 at 10:06
  • @Nipun, do you mind update your post with detailed steps and code, and some screenshot of the error? – Ivan Glasenberg Dec 06 '19 at 00:33
  • Thank you Ivan for your support, since I would like to have a production ready code, I will not use the beta sdk of python – Nipun Dec 06 '19 at 05:54
  • I will go with eventprocessorhost class – Nipun Dec 06 '19 at 05:54
  • This is what I am using : https://github.com/Azure/azure-sdk-for-python/blob/eventhub_track1/sdk/eventhub/azure-eventhubs/examples/eph.py – Nipun Dec 06 '19 at 05:54
  • @Nipun, ok, it's better to use stable version:) – Ivan Glasenberg Dec 06 '19 at 05:55
  • I am facing one more issue let me create a separate ticket on stackoverflow and add the link – Nipun Dec 06 '19 at 06:06
  • @Nipun, yeah, please add the link here later. – Ivan Glasenberg Dec 06 '19 at 06:07
  • Can you please help me with this . https://stackoverflow.com/questions/59207767/how-to-provide-the-complete-path-of-the-container-in-azurestoragecheckpointlease – Nipun Dec 06 '19 at 06:09
  • @Nipun, ok, I'm looking into it now. – Ivan Glasenberg Dec 06 '19 at 06:18
0

azure-eventhub v5 has been GAed in 2020 Jan, and the latest version is v5.2.0

It's available on pypi: https://pypi.org/project/azure-eventhub/

Please follow the migration guide from v1 to v5 to migrate your program.

For receiving with checkpoint, please follow the sample code:

import os
import logging
from azure.eventhub import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblob import BlobCheckpointStore

CONNECTION_STR = os.environ["EVENT_HUB_CONN_STR"]
EVENTHUB_NAME = os.environ['EVENT_HUB_NAME']
STORAGE_CONNECTION_STR = os.environ["AZURE_STORAGE_CONN_STR"]
BLOB_CONTAINER_NAME = "your-blob-container-name"  # Please make sure the blob container resource exists.

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)


def on_event_batch(partition_context, event_batch):
    log.info("Partition {}, Received count: {}".format(partition_context.partition_id, len(event_batch)))
    # put your code here
    partition_context.update_checkpoint()


def receive_batch():
    checkpoint_store = BlobCheckpointStore.from_connection_string(STORAGE_CONNECTION_STR, BLOB_CONTAINER_NAME)
    client = EventHubConsumerClient.from_connection_string(
        CONNECTION_STR,
        consumer_group="$Default",
        eventhub_name=EVENTHUB_NAME,
        checkpoint_store=checkpoint_store,
    )
    with client:
        client.receive_batch(
            on_event_batch=on_event_batch,
            max_batch_size=100,
            starting_position="-1",  # "-1" is from the beginning of the partition.
        )


if __name__ == '__main__':
    receive_batch()

One more thing worth to note is that in V5, we use the metadata of blob to store checkpoint and ownership information instead of storing them as the content of a blob in v1. So it's expected that the content of a blob is empty when using the v5 sdk.

Adam Ling
  • 126
  • 5
  • Do you mean `azure.eventhub.extensions.checkpointstoreblobaio`? `extensions.checkpointstoreblob` does not seem to be a thing. – Casper Lehmann Dec 09 '20 at 18:15
  • `azure.eventhub.extensions.checkpointstoreblob` is the sync version -- https://pypi.org/project/azure-eventhub-checkpointstoreblob/ while `azure.eventhub.extensions.checkpointstoreblobaio` is the async version -- https://pypi.org/project/azure-eventhub-checkpointstoreblob-aio/ – Adam Ling Dec 09 '20 at 21:47
  • Right, thanks. For anyone else asking this question: The async and the sync package are installed separately. azure-eventhub-checkpointstoreblob and pip install azure-eventhub-checkpointstoreblob-aio – Casper Lehmann Dec 11 '20 at 11:24