I'm trying to understand what operations are serialized and what are not with RxPY. So I printed out thread names and current delay in seconds during map
and subscribe
calls in the example below.
I was expecting the delay in seconds for map
operation should be in [1,2,3,4,5] seconds. However, I've got [1,3,5,7,9] seconds for map
. The additional delay of 2s for subscribe
is expected due to time.sleep(2)
during map
. Why is that? It looks like map
on the 2nd element won't start until subscribe
call finished for the 1st element, despite 1st element and 2nd element have their respective threads for both map
and subscribe
.
import reactivex as rx
import concurrent
import time
from reactivex import operators as ops
from threading import current_thread
with concurrent.futures.ThreadPoolExecutor(5) as executor:
start = time.time()
rx.range(1, 6).pipe(
ops.flat_map(lambda s: rx.from_future(executor.submit(lambda x: time.sleep(x) or x, s))),
ops.map(lambda x: print('map', current_thread().name, time.time()-start,x) or time.sleep(2) or x)
).subscribe(lambda x: print('sub', current_thread().name,time.time()-start,x))
Gives output:
map ThreadPoolExecutor-21_0 1.0019810199737549 1
sub ThreadPoolExecutor-21_0 3.0042216777801514 1
map ThreadPoolExecutor-21_1 3.004584789276123 2
sub ThreadPoolExecutor-21_1 5.006811141967773 2
map ThreadPoolExecutor-21_2 5.007160663604736 3
sub ThreadPoolExecutor-21_2 7.008445978164673 3
map ThreadPoolExecutor-21_4 7.008780241012573 5
sub ThreadPoolExecutor-21_4 9.01101279258728 5
map ThreadPoolExecutor-21_3 9.01136064529419 4
sub ThreadPoolExecutor-21_3 11.013587951660156 4
How can I process these 5 elements using 5 threads w/o additional wait while keeping thread affinity (each element is only processed by the same thread across all operators in the pipe)? Similar to behavior below with ParallelStream
/pseq
provided by pyfunctional
package but using ideally a threadpool instead of processes.
import time
from functional import pseq
from multiprocessing import current_process
start = time.time()
pseq(range(1, 6)).map(lambda x: time.sleep(x) or x)\
.map(lambda x:print('map', current_process().name, time.time()-start, x) or time.sleep(2) or x)\
.map(lambda x:print('sub', current_process().name, time.time()-start, x) or x)
with output
map ForkPoolWorker-80 1.0282073020935059 1
map ForkPoolWorker-81 2.031876802444458 2
sub ForkPoolWorker-80 3.0305001735687256 1
map ForkPoolWorker-82 3.035196304321289 3
sub ForkPoolWorker-81 4.034159898757935 2
map ForkPoolWorker-83 4.038431644439697 4
sub ForkPoolWorker-82 5.03748893737793 3
map ForkPoolWorker-84 5.038834571838379 5
sub ForkPoolWorker-83 6.040730953216553 4
sub ForkPoolWorker-84 7.0410990715026855 5
[1, 2, 3, 4, 5]