0

Is it possible to tune docker-workflow-plugin / docker-pipeline-plugin's cleanup routine? Perhaps leave the container and allow pipeline code to handle its removal in a retry block?

I've got a a job that runs serial groups of 30 of these clauses across 64 nodes running on an 8 EC2 instance ECS cluster and one fails during cleanup

                docker.image(selectedNodeLabel).inside {
                    build_kernel_module(version, distro, test, type)
                }

Error

java.io.IOException: Failed to rm container 'dd589a813fec46b7dc97fed273c4ddd09183a561ff8c9d584ea7b299d606d1fb'.
at org.jenkinsci.plugins.docker.workflow.client.DockerClient.rm(DockerClient.java:191)
at org.jenkinsci.plugins.docker.workflow.client.DockerClient.stop(DockerClient.java:178)
at org.jenkinsci.plugins.docker.workflow.WithContainerStep.destroy(WithContainerStep.java:109)
at org.jenkinsci.plugins.docker.workflow.WithContainerStep.access$400(WithContainerStep.java:76)
at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Callback.finished(WithContainerStep.java:390)
at org.jenkinsci.plugins.workflow.steps.BodyExecutionCallback$TailCall.onSuccess(BodyExecutionCallback.java:118)
at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:377)
at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:166)
at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:400)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:312)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:276)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
Peter Kahn
  • 12,364
  • 20
  • 77
  • 135

1 Answers1

0

Our case seems to have been driven by a jenkins ecs cleaner job that periodically cleared defunct containers/volumes etc. We run this because we've several jobs that interact with the docker sock for the ecs host to launch their own test environments. If the job is brutally killed, then the cleanup doesn't run. We run a docker system purges, remove stopped containers etc. The job is supposed to purge anything older than 1 day.

When stopped the job on this cluster, the problems went away.

Peter Kahn
  • 12,364
  • 20
  • 77
  • 135