I have a Dagster job that is training a CNN (using Keras). The Op that runs fit()
is causing the following error:
Multiprocess executor: child process for step train unexpectedly exited with code -9
dagster.core.executor.child_process_executor.ChildProcessCrashException
Stack Trace:
File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/multiprocess.py", line 163, in execute
event_or_none = next(step_iter)
, File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/multiprocess.py", line 268, in execute_step_out_of_process
for ret in execute_child_process_command(command):
, File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/child_process_executor.py", line 157, in execute_child_process_command
raise ChildProcessCrashException(exit_code=process.exitcode)
No additional output is given. I am using a multi-container local Docker deployment.
Things tried:
- I run the code locally (non-Docker) by using
execute_in_process()
and this works without error. - Due to the mention of executor and multiprocess in the stack trace I tried setting the
execution
toin_process
but this merely hangs.
Any advice would be greatly appreciated.