7

Because Kubernetes handles situations where there's a typo in the job spec, and therefore a container image can't be found, by leaving the job in a running state forever, I've got a process that monitors job events to detect cases like this and deletes the job when one occurs.

I'd prefer to just stop the job so there's a record of it. Is there a way to stop a job?

Brent212
  • 1,835
  • 4
  • 19
  • 23

5 Answers5

2

Not really, no such mechanism exists in Kubernetes yet afaik.

You can workaround is to ssh into the machine and run a: (if you're are using Docker)

# Save the logs
$ docker log <container-id-that-is-running-your-job> 2>&1 > save.log
$ docker stop <main-container-id-for-your-job>

It's better to stream log with something like Fluentd, or logspout, or Filebeat and forward the logs to an ELK or EFK stack.

In any case, I've opened this

Rico
  • 58,485
  • 12
  • 111
  • 141
2

1) According to the K8S documentation here.

Finished Jobs are usually no longer needed in the system. Keeping them around in the system will put pressure on the API server. If the Jobs are managed directly by a higher level controller, such as CronJobs, the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy.

Here are the details for the failedJobsHistoryLimit property in the CronJobSpec.

This is another way of retaining the details of the failed job for a specific duration. The failedJobsHistoryLimit property can be set based on the approximate number of jobs run per day and the number of days the logs have to be retained. Agree that the Jobs will be still there and put pressure on the API server.

This is interesting. Once the job completes with failure as in the case of a wrong typo for image, the pod is getting deleted and the resources are not blocked or consumed anymore. Not sure exactly what kubectl job stop will achieve in this case. But, when the Job with a proper image is run with success, I can still see the pod in kubectl get pods.

2) Another approach without using the CronJob is to specify the ttlSecondsAfterFinished as mentioned here.

Another way to clean up finished Jobs (either Complete or Failed) automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the .spec.ttlSecondsAfterFinished field of the Job.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
  • 2
    I'm asking about jobs that are running, not complete/finished. And actually, because of the way K8s works, they will never complete until deleted. I want to be able to stop them but be able to look at the associated resources (job, events, pods, containers) before deleting them. – Brent212 Oct 03 '18 at 18:42
1

You can suspend cronjobs by using the suspend attribute. From the Kubernetes documentation:

https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#suspend

Dan
  • 188
  • 1
  • 7
0

Documentation says:

The .spec.suspend field is also optional. If it is set to true, all subsequent executions are suspended. This setting does not apply to already started executions. Defaults to false.

So, to pause a cron you could:

  1. run and edit "suspend" from False to True.

kubectl edit cronjob CRON_NAME (if not in default namespace, then add "-n NAMESPACE_NAME" at the end)

  • you could potentially create a loop using "for" or whatever you like, and have them all changed at once.
  1. you could just save the yaml file locally and then just run:

kubectl create -f cron_YAML

and this would recreate the cron.

mozo
  • 1
0

The other answers hint around the .spec.suspend solution for the CronJob API, which works, but since the OP asked specifically about Jobs it is worth noting the solution that does not require a CronJob.

As of Kubernetes 1.21, there alpha support for the .spec.suspend field in the Job API as well, (see docs here). The feature is behind the SuspendJob feature gate.

austin_ce
  • 1,063
  • 15
  • 28