0
with DAG(
    "test_dag_venv",
    default_args=default_args,
    description='Dag to test venv',
    schedule_interval="@once",
    start_date=datetime(2022, 1, 6, 10, 45),
    tags=['testing'],
    concurrency=1,
    is_paused_upon_creation=True,
    catchup=False  # dont run previous and backfill; run only latest
) as dag:
    def print_test_1():
        print('print test 1')
    @task.virtualenv(task_id="print_test", requirements=['numpy'], system_site_packages=False)
    def print_test():
        import numpy as np
        print(np.__version__)
        print(print_test_1()) 
    t1 = print_test()
    t1

This is how I defined my DAG. I wanted to check the virtualenv decorator but I am getting an error which is as follows

[2022-09-20 11:14:01,994] {process_utils.py:135} INFO - Executing cmd: /tmp/venvzfr52lj1/bin/python /tmp/venvzfr52lj1/script.py /tmp/venvzfr52lj1/script.in /tmp/venvzfr52lj1/script.out /tmp/venvzfr52lj1/string_args.txt
[2022-09-20 11:14:02,003] {process_utils.py:139} INFO - Output:
[2022-09-20 11:14:02,439] {process_utils.py:143} INFO - 1.19.5
[2022-09-20 11:14:02,439] {process_utils.py:143} INFO - Traceback (most recent call last):
[2022-09-20 11:14:02,440] {process_utils.py:143} INFO -   File "/tmp/venvzfr52lj1/script.py", line 33, in <module>
[2022-09-20 11:14:02,440] {process_utils.py:143} INFO -     res = print_test(*arg_dict["args"], **arg_dict["kwargs"])
[2022-09-20 11:14:02,440] {process_utils.py:143} INFO -   File "/tmp/venvzfr52lj1/script.py", line 31, in print_test
[2022-09-20 11:14:02,441] {process_utils.py:143} INFO -     print(print_test_1())
[2022-09-20 11:14:02,441] {process_utils.py:143} INFO - NameError: name 'print_test_1' is not defined
[2022-09-20 11:14:02,562] {taskinstance.py:1501} ERROR - Task failed with exception

Why is it not able to run the function as expected? I tried decorating print_test_1() with virtualenv too but it still did not work. Ideally I would want to pass in a value to print_test_1 from inside print_test and then use it to run subsequent lines of code in print_test function. I hope there will be some help or if anyone could point me in the right direction as to how i can get away with the error.

raaj
  • 403
  • 1
  • 5
  • 17

1 Answers1

0

there are several errors on your code:

First if you check airflow documentation:

The simplest approach is to create dynamically (every time a task is run) a separate virtual environment on the same machine, you can use the @task.virtualenv decorator. The decorator allows you to create dynamically a new virtualenv with custom libraries and even a different Python version to run your function.

this indicates that your task is run a new enviroment which has no access to other functions outside this task itself, same as you need to install all package you needs in the environment.

Second your print_test_1 returns nothing so the output of print(print_test_1()) will be the same as print(None)

As an alternative you can simple use the @task decorator or define your function inside the task

with DAG(
    "test_dag_venv",
    default_args=default_args,
    description='Dag to test venv',
    schedule_interval="@once",
    start_date=datetime(2022, 1, 6, 10, 45),
    tags=['testing'],
    concurrency=1,
    is_paused_upon_creation=True,
    catchup=False  # dont run previous and backfill; run only latest
) as dag:
   
    @task.virtualenv(task_id="print_test", requirements=['numpy'], system_site_packages=False)
    def print_test():
        import numpy as np

        def print_test_1(val):
            print(val)
            return val # or do what ever you want and mofied it


        print(np.__version__)
        val_modified = print_test_1('print test 1')
        # keep working with val_modified
    t1 = print_test()
    t1
Lucas M. Uriarte
  • 2,403
  • 5
  • 19