2

I'm trying to log from inside an function in a dag, but it hasn't been working. I tried using print("something") as well but that didn't print any out either in the airflow log. How should I do logging here? Many thanks for your help.

import logging
def create_op (dag) -> SparkSubmitOperator:
     
    conf = Variable.get("spark_conf", deserialize_json = True)
    conf_sp = conf.update({"spark.jars.packages":"org.apache.spark:spark-avro_2.12:2.4.3"})

    #trying to log here as the conf_sp wasn't working 
    logger = logging.getLogger("airflow.task")
    logger.info("conf is {}".format(conf_sp)) # it does not print anything  
    logging.info("conf is {}".format(conf_sp)) # it does not print anything either

    op = SparkSubmitOperator(
       application = "my_app",
       conf = conf_sp
       ....
        )
user3735871
  • 527
  • 2
  • 14
  • 31
  • How/where is the `create_op` function invoked? – Oluwafemi Sule Jul 01 '22 at 08:43
  • sorry I didn't include all the dag. It was invoked by task1= create_op (...) The dag was constructed ok. Just the logging didn't show up. – user3735871 Jul 01 '22 at 09:15
  • Does this answer your question? [How to log inside Python function in PythonOperator in Airflow](https://stackoverflow.com/questions/52022329/how-to-log-inside-python-function-in-pythonoperator-in-airflow) – Haha TTpro Jun 15 '23 at 07:45

1 Answers1

0

Use logging

import logging 

logging.info("ds type " +  str(type(ds)))

Full DAG that work

import json
import logging
import pendulum

from airflow.decorators import dag, task
from airflow.models import Variable

@dag(
    schedule=None,
    start_date=pendulum.datetime(2023, 6, 13, tz="UTC"),
    catchup=False,
    tags=["example"],
)
def tutorial_taskflow_api():
    """
    ### TaskFlow API Tutorial Documentation
    This is a simple data pipeline example which demonstrates the use of
    the TaskFlow API using three simple tasks for Extract, Transform, and Load.
    Documentation that goes along with the Airflow TaskFlow API tutorial is
    located
    [here](https://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html)
    """
    @task()
    def extract(**kwargs):
        """
        #### Extract task
        A simple Extract task to get data ready for the rest of the data
        pipeline. In this case, getting data is simulated by reading from a
        hardcoded JSON string.
        """
        # ds = '{{ds}}'
        data_string = '{"1001": 301.27, "1002": 433.21, "1003": 502.22}'
        

        order_data_dict = json.loads(data_string)
        # order_data_dict["ds"] = ds
        return order_data_dict
    
    @task()
    def add_date(order_data_dict: dict, **kwargs):
        # ds = Variable.get("ds")
        ds = kwargs["logical_date"]
        logging.info("ds type " +  str(type(ds)))
        order_data_dict["ds"] = ds.strftime("%d/%m/%Y, %H:%M:%S")
        return order_data_dict


    @task(multiple_outputs=True)
    def transform(order_data_dict: dict):
        """
        #### Transform task
        A simple Transform task which takes in the collection of order data and
        computes the total order value.
        """
        total_order_value = 0

        for value in order_data_dict.values():
            if not isinstance(value, str):
                total_order_value += value
            else:
                ds = value

        return {"total_order_value": total_order_value, "ds": ds}
    @task()
    def load(order_summary_dict: dict):
        """
        #### Load task
        A simple Load task which takes in the result of the Transform task and
        instead of saving it to end user review, just prints it out.
        """
        total_order_value = order_summary_dict["total_order_value"]
        ds = order_summary_dict["ds"]
        str_log = f"Total order value is: {total_order_value:.2f}" + " with ds value " + ds
        print("MEOW print func :" + str_log)
        logging.info("MEOW log func :" + str_log)
        
    order_data = extract()
    order_summary = transform(add_date(order_data))
    load(order_summary)
tutorial_taskflow_api()
Haha TTpro
  • 5,137
  • 6
  • 45
  • 71