3

I parallelise a CPU-bound task, which takes a large nested list as read-only input. To avoid that the nested list is repeatedly copied into the processes, I would like to make the object accessible via shared memory.

I am working on a 64-bit Windows machine, using Python 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] and Spyder 4.2.1.

I already tested implementing shared memory by ray.put(foo) where foo represents a large nested list. However, from the Task manager, I deduce that the nested list is copied into each process.

I was wondering whether Ray offers an alternative solution. In what follows, I present a minimal example to showcase my use-case.

import numpy as np
import ray
import sys 
import time

@ray.remote    
def toy_function(x, nested_list):
    for i in range(5* 10**4):
      m1 = np.random.random((50,50)) 
      matrix = m1 @ np.transpose(m1) 
      sol = np.linalg.inv(matrix)
    result = x + nested_list[0][0]
    return result


if __name__ == "__main__":
    ray.init(num_cpus=5)
    
    # Create a nested list
    N = 10**8
    nested_list = [list(range(2*N)), list(range(3*N))]
    nested_list_id = ray.put(nested_list)
 
    t0 = time.time()
    result = ray.get([toy_function.remote(i, nested_list_id) for i in range(5)])
    t1 = time.time() - t0

    print(f"Execution time: {t1} sec.")
    ray.shutdown() 
Rose
  • 51
  • 3
  • large complex read-only objects you want to share? get linux, and use "fork" as the start method. This is actually quite easy on windows these days with wsl2. It takes less than 30 min to download, install, and set-up ubuntu 20.04 from the windows store. – Aaron Mar 09 '21 at 00:24
  • 1
    Many thanks for your help! wsl2 with forking did the trick. – Rose Mar 15 '21 at 15:21

0 Answers0