0

I have two docker containers as follows:

  1. A web server coded in python
  2. A python script which performs a computational activity and then terminates

I want to be able to have the web server start the computation script contained in container two. The cluster is being controlled by Kubernetes.

My initial thought is to have the web server signal to Kubernetes to start the computation container. However, there may be alternative and better ways to achieve this.

It may also be possible to use a job queue to line up jobs which Kubernetes polls and starts pods when required.

Is there a way to do this?

hojkoff
  • 149
  • 1
  • 8
  • Does the web server need to be able to know that the jobs have finished, and look at their output the same way it would if it were running them locally, or are they just "fire and forget?" There are *a lot* of job running projects that integrate with k8s, but the amount of pain in setting them up and the features they offer differ wildly – mdaniel Nov 04 '20 at 16:55
  • 1
    Hello, you could spawn a `Pod` with your web server that will have tools/permissions to communicate with Kubernetes API to schedule/watch Jobs. You will need to have correct RBAC permissions and library to communicate with the API of your choosing. Here are some links for additional reference: [Setting RBAC permissions](https://kubernetes.io/docs/reference/access-authn-authz/rbac/), [Kubernetes Python client](https://github.com/kubernetes-client/python). – Dawid Kruk Nov 05 '20 at 12:38

1 Answers1

1

My initial thought is to have the web server signal to Kubernetes to start the computation container. However, there may be alternative and better ways to achieve this.

One of the possible ways for this to work would be to:

  • Create a ServiceAccount.
  • Assign specific permissions to your ServiceAccount to support your workload (listing jobs, creating jobs, deleting jobs, etc.)
  • Bind your Role to a ServiceAccount
  • Run your web-server with earlier created ServiceAccount
  • Query the Kubernetes API.

Some of the things to take into consideration:

  • You will need kubectl binary or Kubernetes API Python library inside of a web-server.
  • You will need to consider the storage options for the results of your computational python script.

I've pasted the links I provided in the comment for better visibility:

Also to add I found this article with similar setup:


As for an example of such setup:

Create a ServiceAccount

apiVersion: v1
kind: ServiceAccount
metadata:
  name: python-job-sa

This ServiceAccount will be used with the Deployment/Pod that will host your web-server.

Assign specific permissions to your ServiceAccount

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: python-job-role
rules:
# This will give you access to jobs
- apiGroups: ["batch", "extensions"]
  resources: ["jobs"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
# This will give you access to pods
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
# This will give you access to pods logs
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get", "list", "watch"]

This is a Role that allows to query the Kubernetes API for the resources like Jobs.

Bind your Role to a ServiceAccount

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: python-job-rolebinding
  namespace: default
subjects:
- kind: ServiceAccount
  name: python-job-sa 
  namespace: default
roleRef:
  kind: Role 
  name: python-job-role
  apiGroup: rbac.authorization.k8s.io

This RoleBinding will bind your ServiceAccount to your Role and will allow to communicate with Kubernetes API.

A tip!

To isolate the workload, you can use namespace different than default (you will need to have a ServiceAccount, Role and RoleBinding in the same namespace)!

Run your web-server with earlier created ServiceAccount

For the example/testing purposes the google/cloud-sdk:latest image was used. It already has kubectl and python installed.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloud-sdk-job
spec:
  selector:
    matchLabels:
      app: cloud-sdk-job
  replicas: 1
  template:
    metadata:
      labels:
        app: cloud-sdk-job
    spec:
      serviceAccountName: python-job-sa # <-- IMPORTANT    
      containers:
      - name: cloud-sdk
        image: google/cloud-sdk:latest
        command: 
        - sleep
        - "infinity"

Query the Kubernetes API

Querying should be done by your web-server but for an example purpose you can exec into the running Pod and check if the python script to run a Job is running correctly:

  • $ kubectl exec -it WEB-SERVER-POD-NAME -- /bin/bash

Disclaimer!

google/cloud-sdk:latest requires kubernetes python api library to be installed with pip:

  • $ pip3 install kubernetes

The example code to spawn a Job:

from kubernetes import client, config

JOB_NAME = "example-job" 
def create_job_object():
    # Configureate Pod template container
    container = client.V1Container(
        name="pi",
        image="perl",
        command=["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"])
    # Create and configurate a spec section
    template = client.V1PodTemplateSpec(
        metadata=client.V1ObjectMeta(labels={"app": "pi"}),
        spec=client.V1PodSpec(restart_policy="Never", containers=[container]))
    # Create the specification of deployment
    spec = client.V1JobSpec(
        template=template,
        backoff_limit=4)
    # Instantiate the job object
    job = client.V1Job(
        api_version="batch/v1",
        kind="Job",
        metadata=client.V1ObjectMeta(name=JOB_NAME),
        spec=spec)

    return job

def create_job(api_instance, job):
    # Create job
    api_response = api_instance.create_namespaced_job(
        body=job,
        namespace="default")
    print("Job created. status='%s'" % str(api_response.status))

def main():
    # Configs can be set in Configuration class directly or using helper
    # utility. If no argument provided, the config will be loaded from
    # default location.

    # config.load_kube_config()
    config.load_incluster_config() # <-- IMPORTANT
    batch_v1 = client.BatchV1Api()

    # Create a job object with client-python API. The job we
    job = create_job_object()

    create_job(batch_v1, job)

if __name__ == '__main__':
    main()

Above example is from the Kubernetes Python API library:

Github.com: Kubernetes client: Python: Examples

Checking if the Job was scheduled and completed:

NAME          COMPLETIONS   DURATION   AGE
example-job   1/1           9s         10m

Additional resources:

Dawid Kruk
  • 638
  • 3
  • 11