3

I currently have an Airflow deployment hosted on an EKS cluster, and want it to run a report that will check the logging for another deployment and alert me if any errors have occurred.

Locally I'm able to run this without issue as I can just point the k8s python api to my kubeconfig, however this doesn't work once deployed as there isn't a $Home/.kube directory with the kubeconfig on the pod.

    with client.ApiClient(config.load_kube_config(config_file=k8s_config_file)) as api_client:
        api_instance = client.CoreV1Api(api_client)

I've tried removing the load_kube_config command, however this just throws a connection refused error, presumably because it now doesn't know about any cluster, although it resides in one...

I assume putting the kubeconfig on the deployment wouldn't be a good practice.

How can I get airflow to use the kubeconfig of the cluster it's hosted on? Or is there an alternative I'm missing...

davo777
  • 286
  • 2
  • 15
  • You could use a kubeconfig which matched a k8s user with limited permissions, rather than the one for the k8s cluster owner – Rachel Feb 01 '21 at 16:53

1 Answers1

2

Answering some concerns from the question:

I've tried removing the load_kube_config command, however this just throws a connection refused error, presumably because it now doesn't know about any cluster, although it resides in one...

To run your code inside of the cluster (from a Pod) you will need to switch:

  • from: config.load_kube_config()
  • to: config.load_incluster_config()

Please read below as I addressed the rest of the setup needed to run Kubernetes Python API library code inside the cluster.


How can I get airflow to use the kubeconfig of the cluster it's hosted on? Or is there an alternative I'm missing...

In fact there is a solution that you are missing:

You will need to use a ServiceAccount with proper Roles and RoleBindings.

Let me explain it a bit more and add an example to follow:


Explanation:

To run such setup as I described above you will need to refer to following Kubernetes docs:

As stated in the official documentation:

When you (a human) access the cluster (for example, using kubectl), you are authenticated by the apiserver as a particular User Account. Processes in containers inside pods can also contact the apiserver. When they do, they are authenticated as a particular Service Account (for example, default).

You will need to add permissions to your ServiceAccount with Roles and RoleBidings to allow it to query the Kubernetes API server. For example you will need to add permissions to list Pods.


Example:

I've already answered quite lengthily a similar case on Serverfault. I encourage you to check it out:

I've allowed myself to copy and alter some of the parts of this answer:

Create a ServiceAccount

apiVersion: v1
kind: ServiceAccount
metadata:
  name: python-job-sa

This ServiceAccount will be used with the Deployment/Pod that will host your Python code.

Assign specific permissions to your ServiceAccount

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: python-job-role
rules:
# This will give you access to pods
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
# This will give you access to pods logs
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get", "list", "watch"]

This is a Role that allows to query the Kubernetes API for the resources like > Pods.

Bind your Role to a ServiceAccount

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: python-job-rolebinding
  namespace: default
subjects:
- kind: ServiceAccount
  name: python-job-sa 
  namespace: default
roleRef:
  kind: Role 
  name: python-job-role
  apiGroup: rbac.authorization.k8s.io

After applying those rules you can use the serviceAccount: python-job-sa in your Deployment manifest (in .spec.template.spec) and query Kubernetes API like below:

from kubernetes import client, config

config.load_incluster_config() # <-- IMPORTANT
v1 = client.CoreV1Api()

print("Listing pods with their IPs:")

ret = v1.list_namespaced_pod("default")
for i in ret.items:
    print("%s\t%s\t%s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))

Output:

Listing pods with their IPs:
10.88.0.12  default nginx-deployment-d6bcfb88d-q8s8s
10.88.0.13  default nginx-deployment-d6bcfb88d-zbdm6
10.88.0.11  default cloud-sdk

Additonal resources:

Dawid Kruk
  • 8,982
  • 2
  • 22
  • 45