1

I'm using Kubernetes v1.16.7

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:23:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.7", GitCommit:"be3d344ed06bff7a4fc60656200a93c74f31f9a4", GitTreeState:"clean", BuildDate:"2020-02-11T19:24:46Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

I found that a lot of cronjobs in my cluster do not execute correctly. The log in controller-manager is:

E0826 11:26:45.441592 1 cronjob_controller.go:146] Failed to extract cronJobs list: The provided continue parameter is too old to display a consistent list result. You can start a new list without the continue parameter, or use the continue token in this response to retrieve the remainder of the results. Continuing with the provided token results in an inconsistent list - objects that were created, modified, or deleted between the time the first chunk was returned and now may show up in the list.

I read the code of cronjob in kubernetes, this error is about:

    klog.V(4).Infof("Found %d groups", len(jobsByCj))
    err = pager.New(pager.SimplePageFunc(cronJobListFunc)).
        EachListItem(context.Background(), metav1.ListOptions{}, func(object runtime.Object) error {
        cj, ok := object.(*batchv1beta1.CronJob)
        if !ok {
            return fmt.Errorf("expected type *batchv1beta1.CronJob, got type %T", cj)
        }
        syncOne(cj, jobsByCj[cj.UID], time.Now(), jm.jobControl, jm.cjControl, jm.recorder)
        cleanupFinishedJobs(cj, jobsByCj[cj.UID], jm.jobControl, jm.cjControl, jm.recorder)
        return nil
    })

    if err != nil {
        utilruntime.HandleError(fmt.Errorf("Failed to extract cronJobs list: %v", err))
        return
    }

It may be related to this page https://kubernetes.io/docs/reference/using-api/api-concepts/#retrieving-large-results-sets-in-chunks.

It looks like if the number of cronjob & job is too many (20k+), it will cost a lot of time and the token about continue parameter will be expired.

I want to know whether the number of cron jobs results in this error, and how can I solve it?

Emma
  • 27,428
  • 11
  • 44
  • 69
yr12Dong
  • 21
  • 3
  • Just out of curiosity, how many jobs do you have? I thought we had many in our cluster, but we are not even close to 20k. – ewramner Aug 26 '20 at 12:14
  • Hi ewramner, we have 20668 cronjobs and the same number of job.When I read the code, I delete all the jobs and update cronjob.spec.successhistorylimit :) – yr12Dong Aug 26 '20 at 12:21
  • The error is returned when you do some action, or just in the controller-manager? I will try to reproduce here creating 20k cronjobs and back soon. – Mr.KoopaKiller Aug 26 '20 at 13:25
  • Hi KoopaKiller,create cronjob is ok. I think controller-manager will check cronjobs' schedule to create job and error may be happen there. – yr12Dong Aug 26 '20 at 13:53
  • So, I have create 20400 jobs, but i can't see any error in my cluster. Are you using baremetal or cloud provided cluster? – Mr.KoopaKiller Aug 31 '20 at 11:27
  • We use aws ec2 node with vcpu4. – yr12Dong Sep 01 '20 at 07:20
  • thanks KoopaKiller.At last, we decide to reduce the count of cronjob.We think too much cronjobs is not a good choice for kubernetes ectd. – yr12Dong Sep 01 '20 at 10:15
  • @yr12Dong After reduce the cronjobs the error disappeared? – Mr.KoopaKiller Sep 03 '20 at 08:10
  • 1
    @KoopaKiller Hi KoopaKiller, when I reduce the count of cronjob to 2653, everything look like ok and the error in kubeapi-server disappeared. – yr12Dong Sep 21 '20 at 11:09

0 Answers0