I want to profile Tensorflow model on CloudML. When I use tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE), my process dies with non zero exit code without details of what happened.
I tried adding and removing the code which turns on this option, and there's 100% correlation between this option and the death of the process.
The error message is 'The replica master 0 exited with a non-zero status of 250. Termination reason: Error. To find out more about why your job exited please check the logs'
How can I diagnose and fix this problem?