0

I am trying to implement Pull Model to query change feed using Azure Cosmos Python SDK. I found that to parallelise the querying process, the official documentation mentions about FeedRange value and create FeedIterator to iterate through each range of partition key values obtained from the FeedRange.

Currently my code snippet to query change feed looks like this and it is pretty straight-forward:

# function to get items from change feed based on a condition
def get_response(container_client, condition): 
  
  # Historical data read
  if condition: 
    response = container.query_items_change_feed(
      is_start_from_beginning = True,
      # partition_key_range_id = 0
    )
  
  # reading from a checkpoint
  else:
    response = container.query_items_change_feed(
      is_start_from_beginning = False,
      continuation = last_continuation_token
    )
  return response

The problem with this approach is the efficiency when getting all the items from beginning (Historical Data Read). I tried this method with pretty small dataset of 500 items and the response took around 60 seconds. When dealing with millions or even billions of items the response might take too long to return.

  • Would querying change feed parallelly for each partition key range save time?
  • If yes, how to get PartitionKeyRangeId in Python SDK?
  • Is there any problems I need to consider when implementing this?

I hope I make sense!

  • Please edit to show what you've tried so far. As written, it's off-topic (you're asking for sample code). Also, please be mindful of tags: the `cosmos` tag is unrelated to Cosmos DB and specifically mentions this in the tag description - i removed it, accordingly, and added the correct tag. – David Makogon May 22 '22 at 03:21
  • Please provide enough code so others can better understand or reproduce the problem. – Community May 22 '22 at 04:21

0 Answers0