4

I am trying to run a simple KubernetesPodOperator in my Composer environment as per the documentation here.

The airflow runtime is failing due to lack of permission for the user "default".

That said, how to properly create an environment or to set up default user permissions in order for this code to work?

DAG:

    price_analysis = KubernetesPodOperator(
        task_id='price-analysis',
        name='price-analysis',
        namespace='default',
        image='bash',
        image_pull_policy='Always',
        cmds=['echo'],
        arguments=['something'],
        env_vars={
            'EXPOSURE_THRESHOLD': '5',
            'ESTIMATE_WINDOW': '3,7',
        },
        in_cluster=True,
    )

Logs:

-------------------------------------------------------------------------------
Starting attempt 1 of 
-------------------------------------------------------------------------------

[2019-04-03 14:54:15,611] {models.py:1595} INFO - Executing <Task(KubernetesPodOperator): price-analysis> on 2019-04-03T14:53:59.658367+00:00
[2019-04-03 14:54:15,612] {base_task_runner.py:118} INFO - Running: ['bash', '-c', u'airflow run vat-analysis price-analysis 2019-04-03T14:53:59.658367+00:00 --job_id 54 --raw -sd DAGS_FOLDER/vat_analysis_dag.py --cfg_path /tmp/tmp3RdZOV']
[2019-04-03 14:54:18,375] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis [2019-04-03 14:54:18,374] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800
[2019-04-03 14:54:19,652] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis [2019-04-03 14:54:19,651] {default_celery.py:80} WARNING - You have configured a result_backend of redis://airflow-redis-service.default.svc.cluster.local:6379/0, it is highly recommended to use an alternative result_backend (i.e. a database).
[2019-04-03 14:54:19,659] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis [2019-04-03 14:54:19,659] {__init__.py:51} INFO - Using executor CeleryExecutor
[2019-04-03 14:54:19,826] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis [2019-04-03 14:54:19,825] {app.py:51} WARNING - Using default Composer Environment Variables. Overrides have not been applied.
[2019-04-03 14:54:19,842] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis [2019-04-03 14:54:19,842] {configuration.py:516} INFO - Reading the config from /etc/airflow/airflow.cfg
[2019-04-03 14:54:19,868] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis [2019-04-03 14:54:19,867] {configuration.py:516} INFO - Reading the config from /etc/airflow/airflow.cfg
[2019-04-03 14:54:20,380] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis [2019-04-03 14:54:20,378] {models.py:271} INFO - Filling up the DagBag from /home/airflow/gcs/dags/vat_analysis_dag.py
[2019-04-03 14:54:21,490] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis [2019-04-03 14:54:21,490] {cli.py:484} INFO - Running <TaskInstance: vat-analysis.price-analysis 2019-04-03T14:53:59.658367+00:00 [running]> on host airflow-worker-5b6d7c75c9-w6995
[2019-04-03 14:54:22,093] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis [2019-04-03 14:54:21,822] {pod_launcher.py:58} ERROR - Exception when attempting to create Namespaced Pod.
[2019-04-03 14:54:22,103] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis Traceback (most recent call last):
[2019-04-03 14:54:22,107] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis   File "/usr/local/lib/airflow/airflow/contrib/kubernetes/pod_launcher.py", line 55, in run_pod_async
[2019-04-03 14:54:22,113] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis     resp = self._client.create_namespaced_pod(body=req, namespace=pod.namespace)
[2019-04-03 14:54:22,116] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis   File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/apis/core_v1_api.py", line 6115, in create_namespaced_pod
[2019-04-03 14:54:22,122] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis     (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)
[2019-04-03 14:54:22,126] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis   File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/apis/core_v1_api.py", line 6206, in create_namespaced_pod_with_http_info
[2019-04-03 14:54:22,129] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis     collection_formats=collection_formats)
[2019-04-03 14:54:22,134] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis   File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/api_client.py", line 321, in call_api
[2019-04-03 14:54:22,150] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis     _return_http_data_only, collection_formats, _preload_content, _request_timeout)
[2019-04-03 14:54:22,155] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis   File "/usr/local/lib/python2.7/dist-packages/kubernetes/client/api_client.py", line 155, in __call_api
[2019-04-03 14:54:22,159] {base_task_runner.py:101} INFO - Job 54: Subtask price-analysis     _request_timeout=_request_timeout)
[2019-04-03 14:54:22,138] {models.py:1760} ERROR - (403
Reason: Forbidde
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 03 Apr 2019 14:54:21 GMT', 'Audit-Id': 'c027d4cb-5186-498a-a9b5-0e6c4420b816', 'Content-Length': '284', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff'}
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:composer-1-6-0-airflow-1-10-1-ea0745b4:default\" cannot create pods in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403

Alan Borsato
  • 248
  • 2
  • 13

2 Answers2

2

I've got a reply from Google Composer discussion group in Google Groups. One detail: the service account default:default must be the same service account you are seeing in the error message (in my case it was composer-1-6-0-airflow-1-10-1-ea0745b4:default).

CLUSTER_NAME=.....
NAMESPACE=k8s-tasks
kubectl create ns ${NAMESPACE}

kubectl create clusterrolebinding default-admin \
    --clusterrole cluster-admin \
    --serviceaccount=default:default \
    --namespace ${NAMESPACE}
Alan Borsato
  • 248
  • 2
  • 13
  • 1
    Note that if you don't use `in_cluster=True`, then you won't run into this problem. – hexacyanide Apr 03 '19 at 23:54
  • 2
    @hexacyanide, to be able to run on the same cluster as Composer is running, I needed to set that parameter to `True` unless there is another way (not documented) to achieve the same. i hope future versions of GCC will make it clearer about how to set this up. – Alan Borsato Apr 04 '19 at 13:57
  • 1
    I raised a ticket for this: https://issuetracker.google.com/issues/145900982 – Philip P. Dec 09 '19 at 15:51
  • so `KubernetesPodOperator` should have the `service_account_name="composer-1-6-0-airflow-1-10-1-ea0745b4:default"` based on the error? this did not work for me. let me know if you can expand a little bit on the answer. – alltej Feb 23 '20 at 02:28
  • Hi @alltej, unfortunately I don't have the command anymore. The above command worked for me (replacing default:default by composer-1-6-0-airflow-1-10-1-ea0745b4:default). I changed the implementation to use another cluster for workloads so I can use GKEPodOperator. I recommend using GKEPodOperator becasue we don't want to mess with Composer's GKE, specially if it could impact (or be impacted by) version upgrades. – Alan Borsato Mar 03 '20 at 19:26
1

For people having the same issue with more recent versions of Composer, you don't need to give additional permission to the Kubernetes service account.

There was a breaking change in one of the libraries used by Airflow that impacted the Kubernetes default connection on the KubernetesPodOperator.

You will need to verify if the Composer + Airflow version that you use uses the version 5.0.0 of the apache-airflow-providers-cncf-kubernetes package. You can check in this link through the documentation to confirm.

If your Composer uses that version, make sure you add the config_file parameter in the KubernetesPodOperator.

KubernetesPodOperator(
  # The config file used in Composer to execute jobs within the same cluster
  config_file="/home/airflow/composer_kube_config",
  ...
)

Additionally, for Composer 2, make sure you are running your pods in the composer-user-workloads namespace to allow them to have access to Google Cloud resources.

KubernetesPodOperator(
  namespace="composer-user-workloads",
  ...
)

For more information check the GCP documentation regarding Use Kubernetes Pod Operator

Bruno
  • 182
  • 2
  • 12