0

I suspect the answer is "no, this is not possible", but I want to check with people who have actual familiarity with the library.

Background

I have an application where there are many parallel processes doing similar computations. The computations are composed of several distinct steps which have a deterministic output given an input. For a given step, it is likely that a significant number of processes have identical inputs, so I can realize significant speedup by computing the step once per input and sharing the result among all processes which include that step with that input. I'd like to use an RPC service, such as RPyC, to achieve this.

This is necessary because the codebase is over a decade old and prohibitively difficult to properly refactor. This rules out restructuring the code to more intelligently compute things once, or re-writing in a language that has proper multithreading support (looking at you, GIL).

What I've Tried

import rpyc
import threading
import time
import typing as ty
from typing_extensions import Self


class SessionCache:
    def __init__(self):
        self.lock = threading.Lock()
        self.results = {}

    def access(self, key: ty.Tuple[str, ty.Any], func: ty.Callable[[], None]) -> ty.Optional[ty.Any]:
        # Logic to execute func() if the result is not already computed and stored


class ComputeService(rpyc.Service):
    def __init__(self):
        print('ComputeService.__init__')
        super(ComputeService).__init__()
        self.lock = threading.Lock()
        self.sessions = {}

    def cached(method: ty.Callable[[Self, ty.Any], ty.Any]):
        '''Decorator to insert a caching layer on the function'''
        def wrapper(self, session_id: str, args):
            # Get (or create) the session
            with self.lock:
                session = self.sessions.get(session_id, None)
                if not session:
                    session = SessionCache()
                    self.sessions[session_id] = session

            # Access the cache, providing a callable to generate the result if the cache misses
            return session.access((method.__name__, args), lambda: method(self, args))

        # Return the wrapper as though it is the original method
        wrapper.__name__ = method.__name__
        return wrapper

    @cached
    def exposed_get_primes(self, num: int) -> ty.List[int]:
        time.sleep(5)
        primes = [2]
        current_number = 1
        while len(primes) < num:
            current_number += 2
            if not any((current_number % prime == 0 for prime in primes)):
                primes.append(current_number)
        return primes


if __name__ == '__main__':
    from rpyc.utils.server import ThreadedServer
    server = ThreadedServer(ComputeService, port=18861)
    server.start()

The above code works fine when I call the method from a single client. But adding Client B (in a different process from Client A) to the server, and calling exposed_get_primes with the same arguments, results in a cache miss at first. I expected the cache to hit on the first try.

# Client A
conn = rpyc.connect("localhost", 18861)
primes = conn.root.exposed_get_primes(session_id='test', args=100)
primes = conn.root.exposed_get_primes(session_id='test', args=100)

# Client B (in a different process from Client A)
conn = rpyc.connect("localhost", 18861)
primes = conn.root.exposed_get_primes(session_id='test', args=100)
primes = conn.root.exposed_get_primes(session_id='test', args=100)

# Server (in a different process from Clients A and B)
> ComputeService.__init__
> Cache miss: ('exposed_get_primes', 100)
> Cache hit: ('exposed_get_primes', 100)
> ComputeService.__init__
> Cache miss: ('exposed_get_primes', 100)
> Cache hit: ('exposed_get_primes', 100)

ComputeService is clearly initializing twice, but I want it to only initialize once.

The Question

It looks like the state of the RPyC server is stored in the same process as the client, but this is not what I want. Is there a way to configure RPyC so that Client A and Client B have access to the same internal server state?

If not, what recommendations do you have for similar frameworks that have this functionality?

bcdan
  • 1,438
  • 1
  • 12
  • 25

0 Answers0