0

I've a project in gRPC where the main.py spawns grpc servers as subprocesses.

Also in the project I've settings.py that contains some configurations, like:

some_config = {"foo": "bar"}

In some files (used by different processes) I have:

import settings
...
the value of settings.some_config is read

In the main process I've a listener that updates some_config on demand, for example:

settings.some_config = new_value

I noticed that while changing settings.some_config value in the main process, it was not changed in a subprocess that I checked, and remained the old value.

I want that all subprocess would always have the most up-to-date value of settings.some_config.

A solution I thought about - passing a queue or a pipe to each sub process, and when some_config changes in the main process, I can send the new data through the queue/pipe to each subprocess.

But how can I alert it to assign new value to settings.some_config in the subprocess? Should I use a listener in each subprocesses so that when a notification arrives it will do:

settings.some_config = new_value

Would this work? The end goal is to have settings.some_config value the most up to date across all modules/process without restarting the server. I'm also not sure if it would work since it could be that each module keeps the value of settings.some_config which was previously imported in its cached memory.


UPDATE

I took on Charchit's solution and adjusted it to my requirements, so we have:

from multiprocessing.managers import BaseManager, NamespaceProxy
from multiprocessing import Process
import settings
import time

def get_settings():
    return settings

def run(proxy_settings):
    settings = proxy_settings # So the module settings becomes the proxy object

if __name__ == '__main__':

    BaseManager.register('get_settings', get_settings, proxytype=NamespaceProxy)
    manager = BaseManager()
    manager.start()

    settings = manager.get_settings()
    p = Process(target=run, args=(settings, ))
    p.start()

Few questions:

Should an entire module (settings) be the target of a proxy object? Is it standard to do so?

There is a lot of magic here, for instance, Is the simple answer, to how it works is that now the module settings is a shared proxy object? So when a sub process reads settings.some_config, it would actually read the value from manager?

Are there any side effects I should be aware of?

Should I be using locks when I change any value in settings in the main process?

nscode
  • 147
  • 1
  • 9

2 Answers2

2

The easiest way to do this is to share the module with a manager:

from multiprocessing.managers import BaseManager, NamespaceProxy
from multiprocessing import Process
import settings
import time

def get_settings():
    return settings

def run(settings):
    for _ in range(2):
        print("Inside subprocess, the value is", settings.some_config)
        time.sleep(3)

if __name__ == '__main__':

    BaseManager.register('get_settings', get_settings, proxytype=NamespaceProxy)
    manager = BaseManager()
    manager.start()

    settings = manager.get_settings()
    p = Process(target=run, args=(settings, ))
    p.start()

    time.sleep(1)
    settings.some_config = {'changed': 'value'}
    p.join()

Doing so would mean that you don't have to handle informing subprocesses that there is a change in the value, they will just simply know because they are receiving the value from the manager process which handles this automatically.

Output

Inside subprocess, the value is {'foo': 'bar'}
Inside subprocess, the value is {'changed': 'value'}

Some things to keep in mind

Firstly, remember that settings.some_config needs to be set explicitly. This means you can do settings.some_config = {} but you cannot do settings.some_config['foo'] = "bar". If you want to modify a single key then get the latest config, update that, and explicitly set it like below:

temp = settings.some_config
temp['foo'] = 'bar'
settings.some_config = temp

Secondly, to keep the possible changes to your codebase to an absolute minimal, you are reassigning the settings variable (initially mapped to the settings.py module object) to the proxy. In the above code, you are doing this inside the __main__ block (so settings is being changed globally). Therefore, any changes made to settings from main process would automatically be reflected in the other processes accessing the proxy. This is also being partially replicated inside the child processes running function run. Accessing settings from inside run would mean the same as accessing the proxy. However, if you are calling some other function inside run, (say run2) which does not take settings as an argument, and it tries to access settings, then it will access the imported module instead of the proxy. Example:

def run2():
    print("Inside subprocess run2, the value is", settings.some_config)

def run(settings):
    for _ in range(2):
        print("Inside subprocess run, the value is", settings.some_config)
        time.sleep(3)
    run2()

Output

Inside subprocess run, the value is {'foo': 'bar'}
Inside subprocess run, the value is {'changed': 'value'}
Inside subprocess run2, the value is {'foo': 'bar'}

If you do not want this, then you simply need to assign the argument as the value of the global variable settings:

def run2():
    print("Inside subprocess run2, the value is", settings.some_config)

def run(shared_settings):
    global settings
    settings = shared_settings
    for _ in range(2):
        print("Inside subprocess run, the value is", settings.some_config)
        time.sleep(3)
    run2()

Any function (inside the subprocess) now accessing settings would access the proxy.

Output

Inside subprocess run, the value is {'foo': 'bar'}
Inside subprocess run, the value is {'changed': 'value'}
Inside subprocess run2, the value is {'changed': 'value'}

Lastly, if you have many subprocesses running then this might become slow (more connections to manager = less speed). If this bothers you then I recommend you to do it the way you stated in the description — i.e, "passing a queue or a pipe to each sub process". To make sure that the child process updates it's value as soon fast as it can after you pass the value inside the queue, you can spawn a thread inside the subprocess which constantly polls whether a value in the queue exists, and if it does, it updates the process's settings value to the one provided in the queue. Just make sure to run the thread as a daemon, or explicitly agree on an exit condition.

Update

Should an entire module (settings) be the target of a proxy object? Is it standard to do so?

If your question is whether it is safe to do so then yes it is, just keep in mind the things I have outlined in this answer. At the end of the day, module is just another object, and sharing it here makes more sense.

There is a lot of magic here, for instance, Is the simple answer, to how it works is that now the module settings is a shared proxy object? So when a sub process reads settings.some_config, it would actually read the value from manager?

You need to add a couple of lines in the run function for that to be the case, check the second point in the previous section.

Are there any side effects I should be aware of?

Check previous section.

Should I be using locks when I change any value in settings in the main process?

Not necessary here

Charchit Agarwal
  • 2,829
  • 2
  • 8
  • 20
  • I will upvote this, but I think your approach is more complicated then it needs to be (see my answer). – Booboo Sep 06 '22 at 17:20
  • Thanks, I actually did assigned the parameter passed to run (which is the proxy object) to settings like settings = proxy_object. and so wherever settings.some_config was accessed in the sub processes it always reads from the proxy_object. Is this considered a good approach? Should I consider any side effects to this approach? – nscode Sep 06 '22 at 19:54
  • 1
    @nscode there should be no side effects other than those already mentioned. Remember that `settings.x` needs to be set explicitly, that means you can do `settings.update({})` and `settings.x = {}` but not just `settings.x['foo'] = "bar"`. Lastly, if `settings.py` includes functions that need to be accessed, you would need to subclass `NamespaceProxy` and expose those functions in particular. – Charchit Agarwal Sep 06 '22 at 21:04
  • @Booboo the idea was that this can achieve what the question wants without any changes to existing code. Using a shared dictionary would not only mean that you cannot access a config like you previously would (`settings.some_config` vs `settings['some_config']`), but also the fact that you would need to put all configurations in the dictionary in the first place. This all would make using a shared dictionary more verbose than this alternative. – Charchit Agarwal Sep 06 '22 at 21:10
  • You say you don't want to require changes to existing code yet you are passing to a child process's `run` function variable `settings` whereas exiting code might instead be doing `def run(): import settings; print(settings.some_config)`. In this case the child process will not see changes made to `some_config` by the main process. So if you do have to modify existing code to pass it a shareable object, isn't it easier just to pass a shareable configuration, i.e. a managed `dict`, instead of a shareable module of which only its configuration attribute needs to be shareable? – Booboo Sep 07 '22 at 11:18
  • @Booboo Not really. You are equating a couple of lines that would need to be changed (inside `run`; check point 2 in second section of my updated answer) to changing every line that ever tries to access the configurations (again, doing `settings.some_config` would fail if settings is a shared dict). Bottom line is I don't know what code is inside `run`, and I don't know the number of configurations (could be >1) that need to be shared, nor do I know how many times they are accessed. Given these constraints, sharing a module makes more sense to me. – Charchit Agarwal Sep 07 '22 at 13:03
  • @nscode, check the update – Charchit Agarwal Sep 07 '22 at 13:05
1

Charchit's solution of creating a specialized managed object is more complicated than it needs to be. If the assumption is that the configuration is being stored as a dictionary, then just use a 'multiprocessing.managers.DictProxy' instance returned by the multiprocessor.Manager().dict method. This also allows you to update individual keys rather than having to do your update by setting a completely new dictionary value:

from multiprocessing import Process, Manager
import time

def get_settings(manager):
    return manager.dict({'foo': 'bar', 'x': 17})

def run(settings):
    for _ in range(2):
        print("Inside subprocess, the value is", settings)
        time.sleep(3)

if __name__ == '__main__':

    manager = Manager()

    settings = get_settings(manager)
    p = Process(target=run, args=(settings, ))
    p.start()

    time.sleep(1)
    settings['foo'] = 'changed bar'
    p.join()

Prints:

Inside subprocess, the value is {'foo': 'bar', 'x': 17}
Inside subprocess, the value is {'foo': 'changed bar', 'x': 17}
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • setting.some_config is one example of a dict that I need to update but I've more. would it work if I do for each a get_setting_x and then in get_settings_x() would I be able to return manager.dict(settings.x) and so when I pass it to run, I would be able to assign setting,x = proxy_dict and in everywhere in the subprocess when I access settings.x it would take the *current* value of in manager.dict? – nscode Sep 06 '22 at 19:59
  • Because it worked with Charchit's answer, but there I assigned an entire module to manager.get_settings(), like settings = manager.get_settings() and then in the subprocess I assigned the module settings = proxy_object. I would actually appreciate a lighter solution where I only deal with some variables in settings, but I can't pass them around since settings is imported in multiple places. – nscode Sep 06 '22 at 20:09
  • 1
    By `proxy_dict` do you mean the proxy object that is returned by `multiprocessing.Manager().dict()` so that `setting.x` now becomes a reference to the managed dictionary instead of the original Python built-in `dict`? If a separate process has imported `settings` and then sets the value of one of its attributes, e.g. `x`, this change is not going to be reflected in the instance of `settings` that was imported in another process. – Booboo Sep 06 '22 at 20:10
  • Exactly, proxy_dict is the arg passed to run. so this element (settings.x) is not shared across all process and so changes to it in the main process won't reflect in a sub process that imported settings and accesses settings.x since the value is not shared? just the instance we pass to run? My goal is to have settings.x updated globally since I can't pass it around. btw - I edited my main question with some follow up questions. – nscode Sep 06 '22 at 20:32
  • 1
    That's right -- my understanding is that the value will not be shared. That is, if you do `settings.x = my_managed_dict` in one process, then when an import of `settings` is done *in another process*, `settings.x` will not be the managed dictionary. – Booboo Sep 06 '22 at 20:36