I suspect the answer is "no, this is not possible", but I want to check with people who have actual familiarity with the library.
Background
I have an application where there are many parallel processes doing similar computations. The computations are composed of several distinct steps which have a deterministic output given an input. For a given step, it is likely that a significant number of processes have identical inputs, so I can realize significant speedup by computing the step once per input and sharing the result among all processes which include that step with that input. I'd like to use an RPC service, such as RPyC, to achieve this.
This is necessary because the codebase is over a decade old and prohibitively difficult to properly refactor. This rules out restructuring the code to more intelligently compute things once, or re-writing in a language that has proper multithreading support (looking at you, GIL).
What I've Tried
import rpyc
import threading
import time
import typing as ty
from typing_extensions import Self
class SessionCache:
def __init__(self):
self.lock = threading.Lock()
self.results = {}
def access(self, key: ty.Tuple[str, ty.Any], func: ty.Callable[[], None]) -> ty.Optional[ty.Any]:
# Logic to execute func() if the result is not already computed and stored
class ComputeService(rpyc.Service):
def __init__(self):
print('ComputeService.__init__')
super(ComputeService).__init__()
self.lock = threading.Lock()
self.sessions = {}
def cached(method: ty.Callable[[Self, ty.Any], ty.Any]):
'''Decorator to insert a caching layer on the function'''
def wrapper(self, session_id: str, args):
# Get (or create) the session
with self.lock:
session = self.sessions.get(session_id, None)
if not session:
session = SessionCache()
self.sessions[session_id] = session
# Access the cache, providing a callable to generate the result if the cache misses
return session.access((method.__name__, args), lambda: method(self, args))
# Return the wrapper as though it is the original method
wrapper.__name__ = method.__name__
return wrapper
@cached
def exposed_get_primes(self, num: int) -> ty.List[int]:
time.sleep(5)
primes = [2]
current_number = 1
while len(primes) < num:
current_number += 2
if not any((current_number % prime == 0 for prime in primes)):
primes.append(current_number)
return primes
if __name__ == '__main__':
from rpyc.utils.server import ThreadedServer
server = ThreadedServer(ComputeService, port=18861)
server.start()
The above code works fine when I call the method from a single client. But adding Client B
(in a different process from Client A
) to the server, and calling exposed_get_primes
with the same arguments, results in a cache miss at first. I expected the cache to hit on the first try.
# Client A
conn = rpyc.connect("localhost", 18861)
primes = conn.root.exposed_get_primes(session_id='test', args=100)
primes = conn.root.exposed_get_primes(session_id='test', args=100)
# Client B (in a different process from Client A)
conn = rpyc.connect("localhost", 18861)
primes = conn.root.exposed_get_primes(session_id='test', args=100)
primes = conn.root.exposed_get_primes(session_id='test', args=100)
# Server (in a different process from Clients A and B)
> ComputeService.__init__
> Cache miss: ('exposed_get_primes', 100)
> Cache hit: ('exposed_get_primes', 100)
> ComputeService.__init__
> Cache miss: ('exposed_get_primes', 100)
> Cache hit: ('exposed_get_primes', 100)
ComputeService
is clearly initializing twice, but I want it to only initialize once.
The Question
It looks like the state of the RPyC server is stored in the same process as the client, but this is not what I want. Is there a way to configure RPyC so that Client A
and Client B
have access to the same internal server state?
If not, what recommendations do you have for similar frameworks that have this functionality?