How to work a job queue with kubernetes with scaling

Question

I need a scalable queue handling based on docker/python worker. My thought went towards kubernetes. However, I am unsure about the best controller/service.

Based on azure functions I get incoming http traffic adding simple messages to a storage queue. Those messages need to be worked on and the results fed back into a result queue.

To process those queue messages I developed python code looping the queue and working on those jobs. After each successful loop, the message will be removed from the source-queue and the result written into the result-queue. Once the queue is empty the code exists.

So I created a docker image that runs the python code. If more than one container is started the queue gets worked faster obviously. I also implemented the new Azure Kubernetes Services to scale that. While being new to kubernetes I read about the job paradigm to work a queue until the job is ready. My simple yaml template looks like this:

apiVersion: batch/v1
kind: Job
metadata:
  name: myjob
spec:
  parallelism: 4
  template:
    metadata:
      name: myjob
    spec:
      containers:
      - name: c
        image: repo/image:tag

My problem now is, that the job cannot be restarted.

Usually, the queue gets filled with some entries and then for a while nothing happens. Then again bigger queues can arrive that need to be worked on as fast as possible. Of course, I want to run the job again then, but that seems not possible. Also, I want to reduce the footprint to a minimum if nothing is in the queue.

So my question is, what architecture/constructs should I use for this scenario and are there simple yaml examples for that?

score 15 · Answer 1 · answered Jan 04 '19 at 20:48

15

This may be a "goofy/hacky" answer, but it's simple, robust, and I've been using it in a production system for months now.

I have a similar system where I have a queue that sometimes is emptied out and sometimes gets slammed. I wrote my queue processor similarly, it handles one message in the queue at a time and terminates if the queue is empty. It is set up to run in a Kubernetes job.

The trick is this: I created a CronJob to regularly start one single new instance of the job, and the job allows infinite parallelism. If the queue is empty, it immediately terminates ("scales down"). If the queue is slammed and the last job hadn't finished yet, another instance starts ("scales up").

No need to futz with querying the queue and scaling a statefulset or anything, and no resources are consumed if the queue is sitting empty. You may have to adjust the CronJob interval to fine tune how fast it reacts to the queue filling up, but it should react pretty well.

answered Jan 04 '19 at 20:48

Michael Pratt

3,438
1
17
31

1

Any chance you could share example configs? How does the CronJob know if the last Job hasn't finished? How do you specify higher Job parallelism if the queue depth is too high on this CronJob check? This conceptually seems easier than the other information I've been finding but I'm interested in how it is configured. – jdforsythe Apr 03 '19 at 15:48
1

It's actually pretty simple. I don't limit parallelism at all (just don't set it in the jobTemplate spec), but instead set `concurrencyPolicy: "Allow"` in the cronjob spec. Then set whatever schedule you want to spin up a new worker on, I have it set for every 15 minutes. There's no polling or anything like that, and it doesn't care if the last job was finished. It just spins up a new job every 15 minutes, and they all terminate when there are no items left in the queue. – Michael Pratt Apr 04 '19 at 23:17
In your case, a single job takes care of the whole workload? In my situation, I would like to have multiple job (preferably scale their number) and share the workload by picking tasks from the queue. I found your answer very interesting, and was wondering how you would achieve that. What I have now is a deployment that scales up with the length of the queue, but I would like to use Jobs and CronJobs instead (since they are ephemaral, but deployments keep restarting my job, even with an exit code of 0) – Anas Tiour Jun 26 '19 at 14:39
Usually one Job takes care of the queue within an hour and terminates, but if it gets backed up and doesn't finish by the next CronJob schedule, the CronJob still creates another Job which works in parallel. That way I don't have to mess around with scaling deployments up and down. – Michael Pratt Aug 10 '19 at 18:37

Rico · Answer 2 · 2019-09-26T03:28:30.657

5

This is a common pattern, and there are several ways to architect a solution.

A common solution is to have an app with a set of workers always polling your queue (this could be your python script but you need to make it a service) and generally you'll want to use a Kubernetes Deployment possibly with an Horizontal Pod Autoscaler based on some metrics for your queue or CPU.

In your case, you'll want to make your script a daemon and poll the queue if there are any items (I assume you are already handling race conditions with parallelism). Then deploy this daemon using a Kubernetes deployment and then you can scale up and down based metrics or schedule.

There are already job schedulers out there for many different languages too. One that is very popular is Airflow that it already has the ability to have 'workers', but this may be overkill for a single python script.

edited Sep 26 '19 at 03:28

answered Jan 04 '19 at 21:16

Rico

58,485
12
111
141

Let's say we're using the Deployment and HPA solution, with the HPA metric being the queue length. How do you prevent the scaling down from killing active workers? E.g. we scaled up to 10 workers, 5 finished and the HPA is scaling down the deployment. How do you make sure it kills the 5 workers that finished and not the ones that are still working? – Michal Tenenberg Sep 25 '19 at 08:30
Typically you can manage that with a `preStop` hook defined in your containers and with a termination grace period. More info here: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods. But yes, the HPA doesn't have a mechanism "Terminate oldest policy", similar to https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-instance-termination.html (ASG termination policy in AWS). This could be a feature request :) – Rico Sep 26 '19 at 04:09
There is a feature request, but it's been open since 2017... https://github.com/kubernetes/kubernetes/issues/45509 – Michal Tenenberg Sep 27 '19 at 08:54
Thanks for sharing, I've added a comment to the ticket. – Rico Sep 27 '19 at 14:56

score 0 · Answer 3 · answered Jan 20 '23 at 16:56

You can use Keda in a couple of ways for this:

Scaled deployments

allows you to define the Kubernetes Deployment or StatefulSet that you want KEDA to scale based on a scale trigger. KEDA will monitor that service and based on the events that occur it will automatically scale your resource out/in accordingly.

https://keda.sh/docs/2.9/concepts/scaling-deployments/

ScaledJob

You can also run and scale your code as Kubernetes Jobs. The primary reason to consider this option is to handle processing long running executions. Rather than processing multiple events within a deployment, for each detected event a single Kubernetes Job is scheduled. That job will initialize, pull a single event from the message source, and process to completion and terminate.

https://keda.sh/docs/2.9/concepts/scaling-jobs/

How to work a job queue with kubernetes with scaling

3 Answers3

Scaled deployments

ScaledJob

Linked