4

I'm having an issue with blob storage checkpointing in eventhubs. My application runs fine if I don't have checkpoint_store set when getting the consumer client. Whenever I try to set the checkpoint_store variable and run my code it throws the following exception:

EventProcessor instance 'xxxxxxxxxxx' of eventhub <name of my eventhub> consumer group <name of my consumer group>. An error occurred while load-balancing and claiming ownership. The exception is KeyError('ownerid'). Retrying after xxxx seconds

The only github entry I could find that even mentioned this kind of error is this one, however the issue itself was never resolved and the person with the problem ended up using a different library instead.

The relevant libraries i'm using are azure-eventhub and azure-eventhub-checkpointstoreblob-aio

Here are relevant snippets of the code I'm using (I used this tutorial as a guide):

import asyncio
from azure.eventhub.aio import EventHubConsumerClient, EventHubProducerClient
from azure.eventhub import EventData
from azure.eventhub.extensions.checkpointstoreblobaio import BlobCheckpointStore
async def on_event(partition_context, event):
    await partition_context.update_checkpoint(event)
    #<do stuff with event data>
checkpoint_store = BlobCheckpointStore.from_connection_string(blob_connection_string, container_name)
client = EventHubConsumerClient.from_connection_string(connection_str, consumer_group, eventhub_name=input_eventhub_name, checkpoint_store=checkpoint_store)

async def main():
  async with client:
    await client.receive(
      on_event=on_event,
    )
    print("Terminated.")

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

The issue seems to be solely with blob storage checkpointing; if I comment out 'checkpoint_store=checkpoint_store' when creating the consumer client everything runs with no issues.

The connection to the blob storage appears fine, as I did some digging and found that in the blob storage some folders, 'checkpoint' and 'ownership', were created: blob storage snapshot The latter of which contains some files with an 'ownerid' in their metadata: owner files metadata

I.e. the key definitely exists. What I think is happening is that the EventProcessor is trying to fetch the ownership metadata of these blobs, but is somehow failing to do so. If anyone has any idea as to how to fix this I would very much appreciate it!

2 Answers2

3

This looks like a problem retrieving "ownerid" from one of the blobs. Could you do me a favor to test these scenarios?

  1. Remove everything from the blob container and retry.
  2. If the problem still exists, could you check every blob if they all have metadata "ownerid"?
  3. If the problem still exists, could you replace line 144 of file azure.eventhub.extensions.checkpointstoreblobaio._blobstoragecsaio.py in library azure-eventhub-checkpointstoreblob-aio version 1.1.0 with the following and retry?
"owner_id": blob.metadata.get("ownerid"),
Xie Yijun
  • 96
  • 2
  • Line 144 in the checkpointstoreblobaio sourcecode was indeed causing the issue, replacing it with your edit solved it. Thanks a lot! – Ramon Samuel Aug 12 '20 at 04:28
  • Thanks Ramon for the test. I don't know if this change has any side effect now. Let me know if you observe any problems. I created a [github issue](https://github.com/Azure/azure-sdk-for-python/issues/13060) in the repo to track this problem. – Xie Yijun Aug 12 '20 at 05:55
  • I can't reproduce it in my environment. Could you tell me your python version and os version? – Xie Yijun Aug 12 '20 at 06:50
  • In addition to editing line144, I also had to replace line 244 with `"offset": blob.metadata.get("offset"),` and line 255 with `"sequence_number": blob.metadata.get("sequencenumber"),` to fix it completely. I'm currently developing and testing in an azure databricks environment (Databricks Runtime 7.0 ML) – Ramon Samuel Aug 12 '20 at 09:15
  • I also had this same issue on my local machine running python 3.8.5, and windows 10 build 18362.959 – Ramon Samuel Aug 12 '20 at 09:22
  • This suppresses the errors but may not be the solution. The load balancer needs "ownerid" to tell which process owns which partition, and uses partition "offset" as the checkpoint data to resume receiving after restarting the process. The problem is why these important metadata information is NOT correctly retrieved while they're in the blob metadata. I tested with Python 3.8.5 on my windows 10 machine but didn't reproduce the same problem. I'll continue to look for the root cause. Is it possible for you to try it in a Python 3.7 64bit env in your machine? – Xie Yijun Aug 12 '20 at 20:52
  • I might have found what's causing the issue; I tested in fresh a venv with python 3.7.7 64bit installed. I created two new storage accounts from scratch (azure general storage v1 and v2) and made new containers in each. The issue only occured (and kept occuring) when I used azure storage V2 for checkpointing; Everything ran fine when I instead connected to the azure storage V1 account. This might be worth looking into. – Ramon Samuel Aug 13 '20 at 10:25
  • Appreciate you Ramon for this finding. This helped me a lot. Will look into it and update you. – Xie Yijun Aug 13 '20 at 21:33
  • @RamonSamuel I tested in both azure general storage v1 and v2. I didn't see a problems. Did your storage account v2 use any features other than the default settings from the azure portal? And what is the region of the storage resource? I tested with west-us-2. I can create a resource in your region to test. – Xie Yijun Aug 25 '20 at 00:31
  • I used default settings on both accounts, the region is Australia-East – Ramon Samuel Aug 26 '20 at 08:58
1

The root cause is that the list_blobs functionality of the storage sdk when called on a v2 storage blob with data lake enabled (hierarchical namespace) will not only get the per-partition checkpoint/ownership but also get the parent blob node which contains no metadata.

To illustrate this better, let's say we have the following blob structures:

- fullqualifiednamespace (directory)
  - eventhubname (directory)
    - $default (directory)
        - ownership (directory)
          - 0 (blob)
          - 1 (blob)
          ...

in v2 storage with data lake enabled (hierarchical namespace), when the code was using prefix {<fully_qualified_namespace>/<eventhub_name>/<consumer_group>/ownership to search for blobs, the {<fully_qualified_namespace>/<eventhub_name>/<consumer_group>/ownership directory itself would also be returned containing no metadata leading to the KeyError when we're trying to extract information.

There is a bug fix release for the checkpointstoreblob sdk ,please upgrade to the latest version to see if it resolves your problem.

Let me know if have you more questions.

links:

for sync: https://pypi.org/project/azure-eventhub-checkpointstoreblob/1.1.2/

for async: https://pypi.org/project/azure-eventhub-checkpointstoreblob-aio/1.1.2/

github issue : https://github.com/Azure/azure-sdk-for-python/issues/13060

Adam Ling
  • 126
  • 5