3

I've a Python program like this

if __name__ == "__main__":
  ..
  for t in th:
    ..

And I'm trying to parallelize it using Ray library that seems to be faster than multiprocessing, so I wrote

import ray
ray.init()
@ray.remote
def func(t):
  ..

if __name__ == "__main__":
  ..
  for t in th:
    func.remote(t)

But I get the following error:

: cannot connect to X server
*** Aborted at 1590213890 (unix time) try "date -d @1590213890" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0xbcb00003d43) received by PID 15683 (TID 0x7fb1394f3740) from PID 15683; stack trace: ***
    @     0x7fb138f47f20 (unknown)
    @     0x7fb138f47e97 gsignal
    @     0x7fb138f49801 abort
    @     0x7fb13760cf11 google::LogMessage::Flush()
    @     0x7fb13760cfe1 google::LogMessage::~LogMessage()
    @     0x7fb137394b49 ray::RayLog::~RayLog()
    @     0x7fb137144555 ray::CoreWorkerProcess::~CoreWorkerProcess()
    @     0x7fb1371445aa std::unique_ptr<>::~unique_ptr()
    @     0x7fb138f4c041 (unknown)
    @     0x7fb138f4c13a exit
    @     0x7fb123e4cb37 (unknown)
    @     0x7fb123ddfa98 QApplicationPrivate::construct()
    @     0x7fb123ddfd0f QApplication::QApplication()
    @     0x7fb127c5d428 (unknown)
    @     0x7fb127c682fd (unknown)
    @     0x7fb127c54898 (unknown)
    @     0x7fb126f0a527 (unknown)
    @           0x50a635 (unknown)
    @           0x50bfb4 _PyEval_EvalFrameDefault
    @           0x507d64 (unknown)
    @           0x50ae13 PyEval_EvalCode
    @           0x634c82 (unknown)
    @           0x634d37 PyRun_FileExFlags
    @           0x6384ef PyRun_SimpleFileExFlags
    @           0x639091 Py_Main
    @           0x4b0d00 main
    @     0x7fb138f2ab97 __libc_start_main
    @           0x5b250a _start
Aborted (core dumped)

How can I solve? Thanks.

EDIT: I noticed this warning before the reported error. Don't know if it is of relevance.

WARNING worker.py:1090 -- Warning: The remote function __main__.func has size 288002587 when pickled. It will be stored in Redis, which could cause memory issues. This may mean that its definition uses a large array or other object.

EDIT 2:

The code in the function contains basic operation on matrices and some thresholding. I tried the following minimal code:

import ray
ray.init()

@ray.remote
def f(x):
    print(x)

if __name__ == "__main__":
    for x in (1,2,3):
        f.remote(x)

and I got the following output:

INFO resource_spec.py:212
-- Starting Ray with 73.1 GiB memory available for workers and up to 35.34 GiB for objects.
You can adjust these settings with ray.init( memory              = <bytes>,
                                             object_store_memory = <bytes>
                                             ).
INFO services.py:1170
-- View the Ray dashboard at localhost:8265.
(pid=26359) 1.
(pid=26350) 3.
(pid=26356) 2.
user3666197
  • 1
  • 6
  • 50
  • 92
Lota18-
  • 113
  • 1
  • 5
  • 1
    do you have problem to run any minimal code with ray ? What do you run in function ? Maybe there is code which can't run remotly ? – furas May 23 '20 at 07:40
  • The code in the function contains basic operation on matrices and some thresholding. I tired the following minimal code: `import ray; ray.init(); @ray.remote; def f(x): print(x); if __name__=="__main__": for x in (1,2,3): f.remote(x)` and I get the following output: `INFO resource_spec.py:212 -- Starting Ray with 73.1 GiB memory available for workers and up to 35.34 GiB for objects. You can adjust these settings with ray.init(memory=, object_store_memory=). INFO services.py:1170 -- View the Ray dashboard at localhost:8265. (pid=26359) 1. (pid=26350) 3. (pid=26356) 2.` – Lota18- May 23 '20 at 10:11
  • 1
    add this information to question - it will be more readable and more people will see it. Error shows `X server` and `QApplication::QApplication()` - do you use `Linux` and `PyQt` or other GUI framework ? Usually GUI frameworks can run only in main thread/multiprocess. Error also shows problem with `date -d ...`. Do you use it in code? All problem is inside `func()` so you may have to show code which you use in `func()`. You can also add `print()` in many places to see which will be displayed and this way you can find which part makes problem. – furas May 23 '20 at 10:18
  • @furas really thanks. I don't used `date -d` in my code. For what concern Linux, I'm running the code on a Linux server of which I don't know the characteristics, so can't answer on that part. I will make same try. – Lota18- May 23 '20 at 12:24
  • I have the same problem, for me just the simple following code doesn't work 'import ray ; ray.init()' ==> Aborted. But it worked in other linux machine. I posted an issue at their github page https://github.com/ray-project/ray/issues/14426 – ibra Mar 01 '21 at 20:46

1 Answers1

0

If you are using a cluster managed Slurm, you must submit a job to it, for Ray to function properly.

In fact, this is was my issue, and I post it in their github page before finding the solution : https://github.com/ray-project/ray/issues/14426

You will find in it a simple batch script to submit a job to Slurm.

ibra
  • 1,164
  • 1
  • 11
  • 26