0

I have multiprocessing.managers.ListProxy object. Each ListProxy contain numpy ndarray. After concatenate and save, the file is almost 700Mb.

To this time i made concatenate and save to file in child process but the parent join() has to wait for finishing child process. The child process which concatenate and save a file takes 5 times longer time then computation those lists.

I thing the subprocess.Popen() is solution for problem with long time execution.

How to pass large multiprocessing.managers.ListProxy (700Mb) to subprocess.Popen?

Is that a fast way to do it with json.load? Can I pass ListProxy to json.load?

luki
  • 197
  • 11
  • You say, "Each ListProxy ...", which suggests you have multiple *managed lists* but is that really the case? You also say, "How to pass large multiprocessing.managers.ListProxy (700Mb) to subprocess.Popen?" But a proxy reference for a managed list is very small regardless of how large the actual list is. I am also not clear on why creating a new process with `Popen` solves a problem that a `multiprocessing.Process` or multiprocessing pool cannot solve. You need to post some code that gives me some idea of what your processing actually is. – Booboo Mar 17 '23 at 11:44
  • @ Booboo. Hello. I want to find a fastest way to save my file without waiting for a file. I wrote also another similar question https://stackoverflow.com/questions/75706049/python-how-to-pass-multiprocessing-managers-dictproxy-to-child-daemon-proces-an but there i tried use multiprocessing.process. Now i know that is bad idea (i have to wait for child) and a solution is push saving a file as subprocess. I wrote "Each ListProxy" because of memory. In loop i fill ListProxy and save it to a file next I del all obcjects and run next loop. Actually i do not write ListProxy but it is converted – luki Mar 19 '23 at 17:40

1 Answers1

1

its usually a bad idea to use json.load to serialize and deserialize large numpy arrays, pickle looks better here

So for your multiprocessing.managers.ListProxy object to pass to a child process you should use pickle.dumps() then once the object is serialized to a string use pickle.loads() to deserialize back into a ListProxy object

like this:

import subprocess
import pickle

my_list_proxy = ...

my_list_proxy_serialized = pickle.dumps(my_list_proxy)

child_process = subprocess.Popen(["python", "my_child_process.py", my_list_proxy_serialized])


child_process.wait()

then for you child process you can deserialize like this:

import pickle
import sys

my_list_proxy_serialized = sys.argv[1]

my_list_proxy = pickle.loads(my_list_proxy_serialized)
Saxtheowl
  • 4,136
  • 5
  • 23
  • 32
  • 1
    Tank you for a hint. I will try it tommorow. I will write how many time I save with your solution. Now for example i make computation in time 59.12 s and i generate a file after 277.56 s. I waste a lot of time. If I will push subprocess i can save a time. How much take pickle.dumps(my_list_proxy) will we see. – luki Mar 17 '23 at 20:38
  • pickle.dumps takes 0.00011134147644042969 second. output is a . it is fast. – luki Mar 19 '23 at 17:30
  • nice, dont forget to accept the answer if you think it fixed your problem – Saxtheowl Mar 19 '23 at 18:29
  • 1
    Is that possible to change subprocess.Popen(["python", "my_child_process.py", my_list_proxy_serialized]) to subprocess.call? I would like to open saving process in new gnome-terminal. Thanks for answer. – luki Mar 19 '23 at 20:31
  • yes you can do that, it is a simpler way to execute a child process and wait for it to complete – Saxtheowl Mar 19 '23 at 20:47
  • I am asking because i got : can only concatenate str (not "bytes") to str .. I tried with line: subprocess.call(["gnome-terminal", "--", "bash", "-c", call_string ]) where call_string is "python3 path_to_file" and is working when i call without pickle but i cant concatenate pickle. – luki Mar 19 '23 at 21:02
  • Try to convert it to a string before concatenating – Saxtheowl Mar 19 '23 at 21:15
  • @ Saxtheowl I tried your version child_process = subprocess.Popen(["python3", path_to_file_as_string, my_list_proxy_serialized]) and i got: embedded null byte. What this mean? – luki Mar 19 '23 at 21:29
  • It is when you try to pass a string with a null byte ('\0') character to a function that doesn't support it, I let you on your own if you have more errors after :) – Saxtheowl Mar 19 '23 at 21:52
  • Dear @Saxtheowl I still try to fix my problem. I cant find similar and working solution via google, but I change concept and I am using pickle.load instead of pickle.loads and after all i will process pickle files. But with pickle.load and proxy is a problem like that: https://stackoverflow.com/a/63721327/9403794 . Can you provide a working example for your idea? – luki Mar 24 '23 at 22:21