3

For some troubleshooting, I need to manually change the status of a running job from active to successful to make it completed. The job itself is an infinite loop that does not finish. The option to delete the job cannot be used because it puts the job in a failed state.

Update: The job actually does not fail, instead it gets stuck, and therefore I delete it which makes it go to the failed state. Also, it is not possible to change the code of the job (it is not a bash script).

Thanks

imriss
  • 1,815
  • 4
  • 31
  • 46
  • What is the purpose of this? Are you able to change the code executed by the job? If yes, what language do you use there? – acid_fuji Jan 21 '21 at 10:39
  • Thanks. I do not know how to change the code in a running pod. The job runs a Java application. – imriss Jan 22 '21 at 14:42
  • Why do you need to make it successful? Is this part of some sort of pipeline? – acid_fuji Jan 25 '21 at 08:41
  • Other jobs are depending on it. Making it successful allows the rest to continue, otherwise if I kill this job they will stop. This is for quick troubleshooting where I do not want to stop the rest to add a bypass for the status of this job. – imriss Jan 26 '21 at 14:21
  • I would temporary append shell command `|| exit 0` to the entry point of the Job container to force the Job to report success even if its main command fails. – VAS Feb 02 '21 at 21:31
  • Thanks @VAS. I agree. However, please note that actually the job does not fail; instead it gets struck, and therefore I delete it. This is why the job goes to the failed state. Sorry for the not complete description. I will update it. – imriss Feb 07 '21 at 20:45

1 Answers1

3

It looks to me that you are more interested in treating the symptoms of your problem than the actual reasons behind them.

This is for quick troubleshooting where I do not want to stop the rest to add a bypass for the status of this job.

I think that quicker way would be to actually make sure that your other jobs are less dependable on this one instead of trying to force Kubernetes to mark this Job/Pod as successful.

The closest thing I could get to your goal was to curl api-server directly with kube-proxy. But that solution only works if the job is failed first and unfortunately it does not work with running pods.

For this example I used job that exits with status 1:

      containers:
        - name: job
          image: busybox
          args:
            - /bin/sh
            - -c
            - date; echo sleeping....; sleep 5s; exit 1;

Then run kubectl-proxy:

➜  ~ kubectl proxy --port=8080 &
[1] 18372
➜  ~ Starting to serve on 127.0.0.1:8080

And post the status to api-server:

curl localhost:8080/apis/batch/v1/namespaces/default/jobs/job3/status -XPATCH  -H "Accept: application/json" -H "Content-Type: application/strategic-merge-patch+json" -d '{"status": {"succeeded": 1}}'
    ],
    "startTime": "2021-01-28T14:02:31Z",
    "succeeded": 1,
    "failed": 1
  }
}%

If then check the job status I can see that it was marked as completed.

➜  ~ k get jobs
NAME   COMPLETIONS   DURATION   AGE
job3   1/1           45s        45s

PS. I tried this way to set up the status to successful/completed for either the job or pod but that was not possible. The status changed for moment and then controller-manager reverted the status to running. Perhaps this small window with status changed might be what you want and it will allow your other jobs to move on. I'm merely assuming this since I don't the details.

For more reading how to access API in that way please have a look at the using kubectl docs.

acid_fuji
  • 6,287
  • 7
  • 22