Kubernetes v1.16.7 number of CronJob exceed 20k+ and do not execute

Question

I'm using Kubernetes v1.16.7

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:23:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.7", GitCommit:"be3d344ed06bff7a4fc60656200a93c74f31f9a4", GitTreeState:"clean", BuildDate:"2020-02-11T19:24:46Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

I found that a lot of cronjobs in my cluster do not execute correctly. The log in controller-manager is:

E0826 11:26:45.441592 1 cronjob_controller.go:146] Failed to extract cronJobs list: The provided continue parameter is too old to display a consistent list result. You can start a new list without the continue parameter, or use the continue token in this response to retrieve the remainder of the results. Continuing with the provided token results in an inconsistent list - objects that were created, modified, or deleted between the time the first chunk was returned and now may show up in the list.

I read the code of cronjob in kubernetes, this error is about:

    klog.V(4).Infof("Found %d groups", len(jobsByCj))
    err = pager.New(pager.SimplePageFunc(cronJobListFunc)).
        EachListItem(context.Background(), metav1.ListOptions{}, func(object runtime.Object) error {
        cj, ok := object.(*batchv1beta1.CronJob)
        if !ok {
            return fmt.Errorf("expected type *batchv1beta1.CronJob, got type %T", cj)
        }
        syncOne(cj, jobsByCj[cj.UID], time.Now(), jm.jobControl, jm.cjControl, jm.recorder)
        cleanupFinishedJobs(cj, jobsByCj[cj.UID], jm.jobControl, jm.cjControl, jm.recorder)
        return nil
    })

    if err != nil {
        utilruntime.HandleError(fmt.Errorf("Failed to extract cronJobs list: %v", err))
        return
    }

It may be related to this page https://kubernetes.io/docs/reference/using-api/api-concepts/#retrieving-large-results-sets-in-chunks.

It looks like if the number of cronjob & job is too many (20k+), it will cost a lot of time and the token about continue parameter will be expired.

I want to know whether the number of cron jobs results in this error, and how can I solve it?

Just out of curiosity, how many jobs do you have? I thought we had many in our cluster, but we are not even close to 20k. — ewramner, Aug 26 '20 at 12:14
Hi ewramner, we have 20668 cronjobs and the same number of job.When I read the code, I delete all the jobs and update cronjob.spec.successhistorylimit :) — yr12Dong, Aug 26 '20 at 12:21
The error is returned when you do some action, or just in the controller-manager? I will try to reproduce here creating 20k cronjobs and back soon. — Mr.KoopaKiller, Aug 26 '20 at 13:25
Hi KoopaKiller，create cronjob is ok. I think controller-manager will check cronjobs' schedule to create job and error may be happen there. — yr12Dong, Aug 26 '20 at 13:53
So, I have create 20400 jobs, but i can't see any error in my cluster. Are you using baremetal or cloud provided cluster? — Mr.KoopaKiller, Aug 31 '20 at 11:27
thanks KoopaKiller.At last, we decide to reduce the count of cronjob.We think too much cronjobs is not a good choice for kubernetes ectd. — yr12Dong, Sep 01 '20 at 10:15
@KoopaKiller Hi KoopaKiller, when I reduce the count of cronjob to 2653, everything look like ok and the error in kubeapi-server disappeared. — yr12Dong, Sep 21 '20 at 11:09

Kubernetes v1.16.7 number of CronJob exceed 20k+ and do not execute

0 Answers0