I'm using Python's multiprocessing pool and I created a dictionary using a manager, which the pool will read values from. However, I have encountered an issue where reading values from this dictionary is very slow, even slower than reading from the disk. I don't understand this problem. After some research, it seems that there are locks during the read process, but even with locks it is too slow (it takes about 10 seconds to read from the shared dictionary, while reading from disk takes less than a second). I would like to know why this is happening and what the internal workings are. I thought that reading directly from memory would greatly improve the efficiency of my program, but it is not the case and it is very strange to me.
This question was translated using translation tools, so the content may be slightly inaccurate (including this sentence).
I don't necessarily need a better solution for now (although it would be nice if there was one), I just want to understand why it's so slow and what the underlying reasons are.
Here's an example of the code. Because the code contains a lot of other logic and involves some sensitive information of the company, the following content is just a demonstration and is basically the same as my actual logic.
pool = multiprocessing.Pool(processes=self.process_num)
manager = multiprocessing.Manager()
models_and_configs_dict = manager.dict()
# Here, contents will be read from the disk, stored in a dictionary where the key is a number, and the value is a somewhat complex dictionary with two keys. One key is 'models', which is a list containing sklearn models (which can be quite large, several hundred MB), and the other key is 'configs', which is a list containing dictionaries (with just some simple text inside).
# Here, I'm starting child processes which will read the models and perform data detection.
for i in range(self.process_num):
pool.apply_async(detect_model, args=(models_and_configs_dict, ))
# This is the logic being used to read the models within the child process. These two lines of code taking ten seconds or more to run.
models = models_and_configs[models_and_configs_id]['models']
config = models_and_configs[models_and_configs_id]['configs']