0

I have a Dagster job that is training a CNN (using Keras). The Op that runs fit() is causing the following error:

Multiprocess executor: child process for step train unexpectedly exited with code -9
dagster.core.executor.child_process_executor.ChildProcessCrashException

Stack Trace:
  File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/multiprocess.py", line 163, in execute
    event_or_none = next(step_iter)
,  File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/multiprocess.py", line 268, in execute_step_out_of_process
    for ret in execute_child_process_command(command):
,  File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/child_process_executor.py", line 157, in execute_child_process_command
    raise ChildProcessCrashException(exit_code=process.exitcode)

No additional output is given. I am using a multi-container local Docker deployment.

Things tried:

  • I run the code locally (non-Docker) by using execute_in_process() and this works without error.
  • Due to the mention of executor and multiprocess in the stack trace I tried setting the execution to in_process but this merely hangs.

Any advice would be greatly appreciated.

Atticus
  • 147
  • 1
  • 9
  • Caused by memory issues and solved by increasing the memory settings in Docker. – Atticus Dec 06 '21 at 10:30
  • What memory did you increase? RAM, Virtual Memory, SWAP? Asking for those of us who do not run dagster in docker containers - (virtual machines etc) and experience this issue. – Kay Dec 29 '21 at 02:55
  • Docker's settings has a "memory" setting which I increased to 16Gb. I'm not sure what this refers to under the hood but it is separate from the swap memory setting. – Atticus Jan 02 '22 at 20:55

1 Answers1

0

It's RAM Related. As the guy said, he just needed to reconfigure docker to use more RAM, in his case, about 16GB RAM. I had the same issue and also on Openshift and using a POD. So if anyone else had the same problem, just try increasing your RAM.