1

I am new to airflow and I am trying something simple with GoogleCloudStorageDownloadOperator:

default_args = {
    'start_date': airflow.utils.dates.days_ago(0),
    'schedule_interval': None,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    'params': {
        'work_dir': '/tmp'
    }
}

dag = DAG(
    'foo',
    default_args=default_args,
    description='This is foobar',
    schedule_interval=timedelta(weeks=1),
    dagrun_timeout=timedelta(minutes=60))

mock_download = GoogleCloudStorageDownloadOperator(
    task_id='download-foo-from-gcp',
    bucket='foo-data',
    object='{% if (task_instance.pid % 2 == 0) %}foo{% else %}bar{% endif %}/data.tar.gz',
    filename='{{ params.work_dir }}/data.tar.gz',
    google_cloud_storage_conn_id='google_cloud_default',
    dag=dag
)

While I can run this task in PyCharm for example (using airflow test), it fails all the time when triggered from the web interface (scheduled). The error message in the log is completely useless, to say the least:

... 
[2020-01-09 17:04:18,871] {gcs_download_operator.py:86} INFO - Executing download: crunchbase-mock-data, foo/data.tar.gz, /tmp/data.tar.gz
[2020-01-09 17:04:28,751] {logging_mixin.py:112} INFO - [2020-01-09 17:04:28,751] {local_task_job.py:103} INFO - Task exited with return code -6

Can anyone shed any light on this? What the heck is -6 supposed to mean? Is there a way to see a little more details about what happened there?

Jürgen Simon
  • 876
  • 1
  • 12
  • 35
  • Can you change the logging level to DEBUG and share those logs please and I will help you identify on what might be going wrong – kaxil Jan 20 '20 at 11:09
  • Also please let me know what version of Airflow are you using and your environment (e.g Are you using managed Airflow services: Astronomer, Cloud Composer or running it on VMs, in which case which Linux distro) – kaxil Jan 20 '20 at 11:11
  • And what Executor and DB Backend do you use ? – kaxil Jan 20 '20 at 11:13

2 Answers2

7

I had the same issue in Airflow task running tweepy exits with return code -6.

Are you on Mac OS High Sierra (or above)? If so, refer https://stackoverflow.com/a/52230415/4434664. It solved my issue.

Basically, airflow test/PyCharm merely runs the task in-process, but the scheduler would start a worker process which would call fork(), and apparently, High Sierra introduced some new security changes that's breaking fork() usages in python.

This also caused problems in ansible. Refer https://github.com/ansible/ansible/issues/32499#issuecomment-341578864

Aneesh Makala
  • 341
  • 2
  • 9
0

Can anyone shed any light on this? What the heck is -6 supposed to mean?

There is a contract that

A negative value -N indicates that the child was terminated by signal N (POSIX only).

In your case it means that the process was terminated by SIGABRT (code 6) signal

Is there a way to see a little more details about what happened there?

There is no much background info from your site. In general, try to play with different operators and files. Also, from my perspective Airflow is not well documented. And I recommend to check Airflow sources.

Ilya Bystrov
  • 2,902
  • 2
  • 14
  • 21
  • What baffles me somewhat is that the code works to a degree. It runs fine on Composer but fails to run locally. With identical configuration. Can I get you any more info to analyse this problem? – Jürgen Simon Jan 17 '20 at 15:07
  • 1
    I forgot to mention another piece of information: when I execute the task using `airflow test`, it runs fine. When running it via scheduler, I get that -6 signal problem. What gives? – Jürgen Simon Jan 17 '20 at 15:08