I was reading the parallel flows documentation here and it mentioned:
By default, the parallelism level is set to the number of available CPUs (Runtime.getRuntime().availableProcessors()) and the prefetch amount from the sequential source is set to Flowable.bufferSize() (128). Both can be specified via overloads of parallel().
I still don't understand the purpose of this prefetch, and why it is so big. I guess this means the operators below it will hold onto more than 1 emissions (by default 128). However, I can't imagine this is a good idea, since downstream operators will effectively be single threaded until we have more than 128 emissions from upstream? (e.g. if we have 130, the first 128 will be prefetched by one thread, and the last 2 will be given to the second one. And all other threads will do nothing.).
I guess smaller objects in faster flowables should have a larger prefetch, since the cost of passing data between the rx chain will cost relatively more, so we want prefetch to be higher. I am not sure which numbers to pick here though.