2

I am using KubernetesExecutor as a Executor in Airflow. My DAG code

from datetime import datetime, timedelta
from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.spark_kubernetes import SparkKubernetesOperator
from airflow.providers.cncf.kubernetes.sensors.spark_kubernetes import SparkKubernetesSensor


dag = DAG(
   'spark_pi_using_spark_operator',
   default_args={'max_active_runs': 1},
   description='submit spark-pi as sparkApplication on kubernetes',
   schedule_interval=timedelta(days=1),
   start_date=datetime(2021, 1, 1),
   catchup=False,
)

t1 = SparkKubernetesOperator(
   task_id='spark_pi_submit',
   namespace="default",
   application_file="example_spark_kubernetes_spark_pi.yaml",
   do_xcom_push=True,
   dag=dag,
)

t2 = SparkKubernetesSensor(
   task_id='spark_pi_monitor',
   namespace="default",
   application_name="{{ task_instance.xcom_pull(task_ids='spark_pi_submit')['metadata']['name'] }}",
   dag=dag,
)
t1 >> t2 

DAG executes successfully.I am able to see output in spark-driver logs by executing kubectl logs spark-pi-driver enter image description here

But I am not able to see the same logs in Airflow UI. enter image description here

Andrew Skorkin
  • 1,147
  • 3
  • 11
Manju N
  • 886
  • 9
  • 14
  • Why should you see them? The process is executed on remote machine the Airflow logs will show only what the process reported back. If you want to collect the logs and dump them to the task log - you will need to write this functionality. – Elad Kalif Jan 31 '22 at 22:16

1 Answers1

4

update the SparkKubernetesSensor configuration as below

t2 = SparkKubernetesSensor(
   task_id='spark_pi_monitor',

   namespace="default",

   application_name="{task_instance.xcom_pull(task_ids='spark_pi_submit') 
                      ['metadata']['name'] }}",

   dag=dag,

   attach_log=True,
)
eshirvana
  • 23,227
  • 3
  • 22
  • 38
Krishna
  • 163
  • 2
  • 6