3

I am running an opencpu based image on openshift, every time the pod starts, after just a few seconds, it crashes with the error:

command terminated with non-zero exit code: Error executing in Docker Container: 137

Event tab shows only below three events and terminal logs does not show anything as well.

Back-off restarting the failed container
Pod sandbox changed, it will be killed and re-created.
Killing container with id docker://opencpu-test-temp:Need to kill Pod

I am really not getting any clue on why container gets restarted in every few seconds. This image runs just fine locally.

Does anyone give me a clue on how to debug this issue ?

Andreas Lorenzen
  • 3,810
  • 1
  • 24
  • 26
Hound
  • 837
  • 17
  • 31
  • 1
    Look at the ``oc debug`` command. It allows you to run up a debug pod with same deployment config and image, but it will not start the application. You can then from the shell provided run the startup command shown manually and see what happens. As suggested below, memory is a good candidate for being an issue. – Graham Dumpleton Mar 10 '19 at 19:35

1 Answers1

8

Error 137 is often memory related in a docker context.

The actual error is from the process that is isolated in the docker container. It means that the process could not be killed with a SIGKILL. Source

From bobcares.com:

Error 137 in Docker denotes that the container was ‘KILL’ed by ‘oom-killer’ (Out of Memory). This happens when there isn’t enough memory in the container for running the process.

‘OOM killer’ is a proactive process that jumps in to save the system when its memory level goes too low, by killing the resource-abusive processes to free up memory for the system.

Try checking your memory config of the container? And available memory on the host that is launching the pod? Is there nothing the the opencpu container log?

Check the seting rlimit.as in the config file /etc/opencpu/server.conf, inside the image. This limit is the "per request" memory limit for your opencpu instance (I realize that your problem is at startup, so this is perhaps not too likely to be the case).

Andreas Lorenzen
  • 3,810
  • 1
  • 24
  • 26
  • 1
    Indeed it was a openshift memory issue. Once I added the resource requests to request for more memory and cpu, everything worked fine. Thanks – Hound Mar 12 '19 at 04:02
  • I am so glad that I was able to help you on this. Does that mean that you can accept the answer? – Andreas Lorenzen Mar 12 '19 at 06:51