0

I am running a Jenkins Controller in kubernetes. I have noticed that the controller has been restarting ALOT.

kgp jkmaster-0
NAME                  READY   STATUS    RESTARTS   AGE
jkmaster-0            1/1     Running   8          30m

The memory allocation to the pod is as follows

    Limits:
      memory:  2500M
    Requests:
      cpu:      300m
      memory:   1G

As long as the controller is idle, I dont see any spikes occurring. But as soon as I start spawning jobs, I notice that there are spikes and each spike results in a OOMError and a restart happens

enter image description here

kgp jkmaster-0
NAME                  READY   STATUS      RESTARTS   AGE
jkmaster-0            0/1     OOMKilled   3          3h8m

Inorder to look into this further, I would like to generate a Heap Dump. So what I am done is to add the following

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/srv/jenkins/

to JAVA_OPTS. I am expecting that the next time when the jenkins controller hits OOM, it should generate a Heapdump with under /srv/jenkins/ but there is none. Any idea if there is something I have missed ?

There is no file of the type java_pid.hprof under /srv/jenkins/ after a restart.

All JAVA_OPTS

JAVA_OPTS: -Djava.awt.headless=true -XX:InitialRAMPercentage=10.0 -XX:MaxRAMPercentage=60.0 -server -XX:NativeMemoryTracking=summary -XX:+UseG1GC -XX:+ExplicitGCInvokesConcurrent -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication \
-XX:+UnlockDiagnosticVMOptions -XX:G1SummarizeRSetStatsPeriod=1 -XX:+PrintFlagsFinal -Djenkins.install.runSetupWizard=false -Dhudson.DNSMultiCast.disabled=true \
-Dhudson.slaves.NodeProvisioner.initialDelay=5000 -Dsecurerandom.source=file:/dev/urandom \
-Xlog:gc:file=/srv/jenkins/gc-%t.log -Xlog:gc*=debug -XX:+AlwaysPreTouch -XX:+DisableExplicitGC \
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/srv/jenkins/ -Dhudson.model.ParametersAction.keepUndefinedParameters=true -Dhudson.model.DownloadService.noSignatureCheck=true
Chin Huang
  • 12,912
  • 4
  • 46
  • 47
jeunii
  • 391
  • 4
  • 13
  • How much ram do your nodes have? – paltaa Feb 02 '21 at 18:08
  • @paltaa 25GB memory allocatable – jeunii Feb 02 '21 at 18:11
  • Well, unless you have volume mounted `/srv/jenkins` to a `hostPath:` or a PVC, it is very likely the Pod bounce is resetting the root FS in your container – mdaniel Feb 03 '21 at 03:50
  • @mdaniel `/srv/jenkins` is mounted on a PVC. – jeunii Feb 03 '21 at 15:45
  • 1
    I just now realized the disconnect: OOMKilled is something `kubelet` does _to_ your container, and not something that the JVM does to itself. That process was `kill -9`-ed (in fact, I don't know of any "warning shot" k8s offers the Pod); if you are interested in having the JVM participate in the OOM triage, you'll want to lower the Xmx below the Pod's resource boundary, so the JVM exhausts itself before k8s steps in with a more violent outcome – mdaniel Feb 03 '21 at 16:58
  • @mdaniel great idea. Ill try that – jeunii Feb 03 '21 at 16:59

0 Answers0