I am trying to implement Pull Model to query change feed using Azure Cosmos Python SDK. I found that to parallelise the querying process, the official documentation mentions about FeedRange value and create FeedIterator to iterate through each range of partition key values obtained from the FeedRange.
Currently my code snippet to query change feed looks like this and it is pretty straight-forward:
# function to get items from change feed based on a condition
def get_response(container_client, condition):
# Historical data read
if condition:
response = container.query_items_change_feed(
is_start_from_beginning = True,
# partition_key_range_id = 0
)
# reading from a checkpoint
else:
response = container.query_items_change_feed(
is_start_from_beginning = False,
continuation = last_continuation_token
)
return response
The problem with this approach is the efficiency when getting all the items from beginning (Historical Data Read). I tried this method with pretty small dataset of 500 items and the response took around 60 seconds. When dealing with millions or even billions of items the response might take too long to return.
- Would querying change feed parallelly for each partition key range save time?
- If yes, how to get PartitionKeyRangeId in Python SDK?
- Is there any problems I need to consider when implementing this?
I hope I make sense!