Imagine the following pattern:
execute(): The execute() method is called once and a reactor code scans the entire directory, does parse files and puts the result into a blocking queue.
fetch(): The method fetch() is called multiple times from an external process and takes the next 1000 rows from above queue.
In standard Java I would implement that with one thread writing into an ArrayBlockingQueue and the other thread reading the queue. But how can that be done efficiently and savely using Reactor as producer of the data?
Requirements:
- The read will be slower than the data producer and I don't want the queue to fill up too much. Hence a blocking queue.
- It is possible to stop the process at any time, e.g. the producer found 100000 rows but the fetch was reading just 100 rows and then decided it had all the data needed. Calling the dispose() method should stop all producer threads immediately, even if this Flux is waiting.
- The fetch() has the means to know that all data has been read.
I understand what a producer/subscriber pattern is, but the subscriber would be a constantly running thread. Not something I can call and it gets me the next record when I am ready for it.
So it is sort of a push/pull. The producer pushes new data and there shall be a method with which I can pull the next row off the queue in my own time.
Any thoughts?
Producer (simplified):
DataLakeFileSystemAsyncClient asyncfsclient = asyncclient.getFileSystemAsyncClient(name);
ListPathsOptions options = new ListPathsOptions();
options.setPath("/");
options.setRecursive(true);
Flux<PathItem> items = asyncfsclient.listPaths(options).take(10000);