0

I use RxPy for using processing files and I want to build sequence of pipe loading

pool_scheduler = ThreadPoolScheduler(multiprocessing.cpu_count())

    rx.from_list(independing_files).pipe(
        self._build_dataflow(),
        ops.subscribe_on(pool_scheduler),
    ).subscribe(
        on_next=lambda file: logger.info(f'file: {file}'),
        on_error=print,
        on_completed=lambda: logger.info("independing frames loaded!"))

    withdraw_file = []
    for file in filtered_files:
        if self._table_name_on_contain(file) == 'mellow':
            withdraw_file += [file]

    rx.from_list(withdraw_file).pipe(
        self._build_apples_dataflow(),
        ops.subscribe_on(pool_scheduler)
    ).subscribe(
        on_next=lambda file: logger.info(f'file: {file}'),
        on_error=print,
        on_completed=lambda: logger.info("apples loaded!"))

    rx.from_list(depending_files).pipe(
        self._build_dataflow(),
        ops.subscribe_on(pool_scheduler)
    ).subscribe(
        on_next=lambda file: logger.info(f'file: {file}'),
        on_error=print,
        on_completed=lambda: self._complete_action())

But I got a result that I have not expected: it seems that each pipe runs asynchronously because I have not denoted "stop-points". I want that second and third pipe will start only after first pipe done. How to fix it?

Asti
  • 12,447
  • 29
  • 38
DisplayName
  • 219
  • 4
  • 23

2 Answers2

1

You can use multiprocessing.Event for synchronizing your pipes:

event = multiprocessing.Event()

rx.pipe(...).subscribe(on_completed=event.set)

event.wait()

rx.pipe(...)
rx.pipe(...)
duthils
  • 1,181
  • 3
  • 7
  • Solution works. Threads are synchronized. There is one doubt, processors cores are loaded like you don't do any concurrency, like one thread execution. :( – Vodyanikov Andrew Feb 10 '21 at 09:41
  • Due to the Python GIL, threads should not be used for concurrency in CPU-bound computation. See the [RxPy documentation about this](https://rxpy.readthedocs.io/en/latest/get_started.html#cpu-concurrency). – duthils Feb 12 '21 at 01:10
0

As it said above, I used conditional variables to force threads to come to a barrier point. I've got this such thing and it works finely.

import logging
import threading

import rx
from EventMonitoringETL.tools import logging_config
from rx import operators as ops
import multiprocessing
import rx.scheduler as scheduler
if __name__ == '__main__':
    logging_config.InitLogging()
    logger = logging.getLogger('c4t_etl')

    thread_count = multiprocessing.cpu_count()
    thread_pool_scheduler = scheduler.ThreadPoolScheduler(thread_count)

    event = multiprocessing.Event()

    rx.of(1,2,3,4,5,6,7,8,9,10).pipe(
      ops.subscribe_on(thread_pool_scheduler)
      ).subscribe(lambda i: print(f'{i} - {threading.get_ident()}'), on_completed=event.set)

    event.wait()
    event.clear()
    print("AAAAA")

    rx.of(11,12,13,14,15,16,17,18,19,110).pipe(
      ops.subscribe_on(thread_pool_scheduler)
      ).subscribe(lambda i: print(f'{i} - {threading.get_ident()}'), on_completed=event.set)

    event.wait()
    event.clear()
    print("BBBBB")

    rx.of(21,22,23,24,25,26,27,28,29,210).pipe(
      ops.subscribe_on(thread_pool_scheduler)
      ).subscribe(lambda i: print(f'{i} - {threading.get_ident()}'), on_completed=event.set)

    event.wait()
    event.clear()
    print("CCCCC")

    rx.of(31,32,33,34,35,36,37,38,39,310).pipe(
      ops.subscribe_on(thread_pool_scheduler)
      ).subscribe(lambda i: print(f'{i} - {threading.get_ident()}'), on_completed=event.set)

    event.wait()
    event.clear()
    print("DDDDDDD")
DisplayName
  • 219
  • 4
  • 23