I am using prefect and defining a flow to insert documents with cosmos db.
The problem is the query_items() call is an iterable, and for large containers, there would be no way to hold all entries in memory.
I believe my problem could be reduced to:
given an iterator, how can I create batches to be processed (mapped) in a prefect flow?
Example:
def big_iterable_function_i_cannot_change():
yield from range(1000000) # some large amount of work
@task
def some_prefect_batching_magic(x):
# magic code here
pass
with Flow("needs-to-be-batched"):
some_prefect_batching_magic.map(big_iterable_function_i_cannot_change())
The above code, or something like it will give me an error:
prefect.FlowRunner | Flow run FAILED: some reference tasks failed.