3

I am new to Kubernetes, and trying to get apache airflow working using helm charts. After almost a week of struggling, I am nowhere - even to get the one provided in the apache airflow documentation working. I use Pop OS 20.04 and microk8s.

When I run these commands:

kubectl create namespace airflow
helm repo add apache-airflow https://airflow.apache.org
helm install airflow apache-airflow/airflow --namespace airflow

The helm installation times out after five minutes.

kubectl get pods -n airflow

shows this list:

NAME                                   READY   STATUS     RESTARTS   AGE
airflow-postgresql-0                   0/1     Pending    0          4m8s
airflow-redis-0                        0/1     Pending    0          4m8s
airflow-worker-0                       0/2     Pending    0          4m8s
airflow-scheduler-565d8587fd-vm8h7     0/2     Init:0/1   0          4m8s
airflow-triggerer-7f4477dcb6-nlhg8     0/1     Init:0/1   0          4m8s
airflow-webserver-684c5d94d9-qhhv2     0/1     Init:0/1   0          4m8s
airflow-run-airflow-migrations-rzm59   1/1     Running    0          4m8s
airflow-statsd-84f4f9898-sltw9         1/1     Running    0          4m8s
airflow-flower-7c87f95f46-qqqqx        0/1     Running    4          4m8s

Then when I run the below command:

kubectl describe pod airflow-postgresql-0 -n airflow

I get the below (trimmed up to the events):

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  58s (x2 over 58s)  default-scheduler  0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.

Then I deleted the namespace using the following commands

kubectl delete ns airflow

At this point, the termination of the pods gets stuck. Then I bring up the proxy in another terminal:

kubectl proxy

Then issue the following command to force deleting the namespace and all it's pods and resources:

kubectl get ns airflow -o json | jq '.spec.finalizers=[]' | curl -X PUT http://localhost:8001/api/v1/namespaces/airflow/finalize -H "Content-Type: application/json" --data @-

Then I deleted the PVC's using the following command:

kubectl delete pvc --force --grace-period=0 --all -n airflow

You get stuck again, so I had to issue another command to force this deletion:

kubectl patch pvc data-airflow-postgresql-0 -p '{"metadata":{"finalizers":null}}' -n airflow

The PVC's gets terminated at this point and these two commands return nothing:

kubectl get pvc -n airflow
kubectl get all -n airflow

Then I restarted the machine and executed the helm install again (using first and last commands in the first section of this question), but the same result.

I executed the following command then (using the suggestions I found here):

kubectl describe pvc -n airflow

I got the following output (I am posting the event portion of PostgreSQL):

Type    Reason         Age                   From                         Message
  ----    ------         ----                  ----                         -------
  Normal  FailedBinding  2m58s (x42 over 13m)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set

So my assumption is that I need to provide storage class as part of the values.yaml

Is my understanding right? How do I provide the required (and what values) in the values.yaml?

Rhonald
  • 363
  • 5
  • 18
  • Chose a CNS solution. Implement it. Create a StorageClass, set it as default, you don't need change anything to your values. If for some reason, you don't want to set a default SC, you can check on GitHub for their defaults: https://github.com/airflow-helm/charts/blob/main/charts/airflow/values.yaml – SYN Dec 03 '21 at 21:08
  • any luck getting your solution working? I'm experiencing the same problem – capa_matrix Apr 07 '22 at 20:15

1 Answers1

1

If you installed with helm, you can uninstall with helm delete airflow -n airflow.

Here's a way to install airflow for testing purposes using default values:

Generate the manifest helm template airflow apache-airflow/airflow -n airflow > airflow.yaml

Open the "airflow.yaml" with your favorite editor, replace all "volumeClaimTemplates" with emptyDir. Example:

enter image description here

Create the namespace and install:

kubectl create namespace airflow
kubectl apply -f airflow.yaml --namespace airflow

enter image description here

You can copy files out from the pods if needed.

To delete kubectl delete -f airflow.yaml --namespace airflow.

gohm'c
  • 13,492
  • 1
  • 9
  • 16
  • Thanks, but unfortunately, I ran into issues again (even after following exact commands). The database works, and I could log in, but the pods Redis, worker keep showing pending, scheduler, triggered and webserver stuck at init, and flower keeps crashing (and restarting) at liveness test throwing error: Container flower failed liveness probe, will be restarted. Scheduler, Triggerer, and webserver are waiting for the migrations to complete. – Rhonald Dec 05 '21 at 00:14
  • Can you do `kubectl logs --all-containers ` and copy the relevant error messages to your question? Another thing is watch your computer available memory as you will be running quite a number of applications. – gohm'c Dec 05 '21 at 02:05
  • Thanks, I copied all of them to this dropbox folder: https://www.dropbox.com/sh/usbbb45y1s0nldd/AACTPBSqviIxVFEo09wz1doia?dl=0 . In short, it seems the flower is failing consistently due to not being able to reach port 5555, and the user migrations did not happen. Start with the INDEX.txt for the summary and the other text files named after the respective pods have full output of the describe. My desktop is high on resources (64GB RAM, 1TB NVME 2 HDD, etc.), and so I don't think lack of resources is the problem. Appreciate you trying to help me out. – Rhonald Dec 05 '21 at 17:34
  • That's a good piece of hardware you got there. First thing obvious in the errors is the PersistentVolumeClaim. If you have follow the step correctly, the `airflow.yaml` that you applied should **not** have any use of PVC. Try `helm repo update` and generate the spec, update accordingly and make sure no PVC in the spec before you apply again. – gohm'c Dec 06 '21 at 01:42
  • Thanks! Well, all the services are void of PVC-related issues now. However, the liveness and readiness probe keeps crashing for the flower, which is stuck. curl: (7) Failed to connect to localhost port 5555: Connection refused. It seems the flower is not able to reach the [redis](https://github.com/mher/flower/issues/639) – Rhonald Dec 06 '21 at 19:51
  • One issue I see is that the migrations never happen and the database Postgres is empty with no tables. The pod for user migration does not throw any error. The result is that the webserver, scheduler, triggerer and worker are stuck at Init – Rhonald Dec 06 '21 at 20:33
  • It appears the migrations never happen and the jobs are in an indefinite wait stage. Do yo u have any ideas? – Rhonald Dec 07 '21 at 00:22
  • Can you share the `airflow.yaml` that you applied on dropbox and latest errors text? – gohm'c Dec 07 '21 at 00:59
  • Sure, I made some changes to it. I added PV, PVC to the local disk and got rid of all the errors related to PVC. I don't get any error related to PVC, but it is indefinitely stuck (never worked) at initializing the database. The latest [airflow.yaml is here](https://www.dropbox.com/sh/671bwgfho044mis/AABstUmLOnlNuxJP5kf3wXB3a?dl=0). – Rhonald Dec 07 '21 at 01:30
  • 1
    Compare to the standard chart, you have additional secret `airflow-postgresql-db`, removed liveness/readiness probes, changed script... anyway, I just retry the procedure in the answer on EKS; it works without any issue and I updated the screenshot. Do you have a place where the `airflow.yaml` in used can be upload for you? – gohm'c Dec 07 '21 at 02:12
  • Appreciate it :-) If it works, you have saved me from lots of trouble. I wanted to publish a photography platform on the dev server for select public access by end of this week and this is killing me. I am new to Kubernetes and man, it's hard. I have created a file request [here](https://www.dropbox.com/request/48bZXmfepxAjwX3ol4wR) Let me know if this works – Rhonald Dec 07 '21 at 04:29
  • 1
    Uploaded "stackoverflow-airflow.yaml". This is the file used for the successful ran. – gohm'c Dec 07 '21 at 05:02
  • Thanks, appreciate it. Unfortunately, it did not work (same issue). I think there is some other issue with my desktop that is screwing up the airflow. I will have to try on another laptop tomorrow and update you. My desktop has got both Docker and microk8s installed. Also, I am not sure whether something else is screwing with it. – Rhonald Dec 07 '21 at 05:13
  • So it worked without any issue in a VirtualBox on my mac. However, it failed to work on my Linux desktop. My observations are that the migrations are not able to establish database connectivity and hence they are stuck at that task, and everything else fails because of that. The liveness and readiness error from the flower does not seem to make any difference (once the migrations are done, those errors stop). I have both docker and Kubernetes on the same desktop (and not sure what else I have in there). So, I am going to reinstall the desktop and give a shot. – Rhonald Dec 08 '21 at 00:16
  • 1
    The spec is not tied to specific k8s setup (except for Windows based), so long the cluster is free of error and have enough capacity it should just run. Pertain to your origin question the answer is correct, please mark it to help the next person. – gohm'c Dec 08 '21 at 00:52
  • Yep, I will mark it as solved, and continue exploring. – Rhonald Dec 08 '21 at 04:02