0

Hi I am running papermill inside google composer (Manager airflow). I am using PythonVirtualenvOperator to run papermill inside composer. The source notebook is inside google cloud storage and the path where I need to store the executed notebook is also inside google cloud storage. But when running papermill like that am getting an error : Unexpected keyword argument 'min'.

The below are the code snippet:

def getGCSObjects():
  import papermill as pm
  pm.execute_notebook(
    'gs://BUCKET/inputs/add.ipynb',
    'gs://BUCKET/inputs/add_out.ipynb',
    parameters=dict(alpha=0.6, ratio=0.1)
  )

list_gcs_files = PythonVirtualenvOperator(
  task_id='list_gcs_files',
  system_site_packages=True,
  python_version='3.6',
  requirements=[
   'gcsfs>=0.2.0'
   'papermill',
  ],
  dag=dag,
  python_callable=getGCSObjects,
)

Error output:

[2021-06-30 09:14:17,905] {taskinstance.py:902} INFO - Executing <Task(PythonVirtualenvOperator): list_gcs_files> on 2021-06-30T00:00:00+00:00
[2021-06-30 09:14:17,905] {taskinstance.py:902} INFO - Executing <Task(PythonVirtualenvOperator): list_gcs_files> on 2021-06-30T00:00:00+00:00
[2021-06-30 09:14:19,489] {python_operator.py:316} INFO - Executing cmd
['virtualenv', '/tmp/venvoyf919ht', '--system-site-packages', '--python=python3.6']
[2021-06-30 09:14:19,828] {python_operator.py:321} INFO - Got output
b'created virtual environment CPython3.6.10.final.0-64 in 235ms\n  creator CPython3Posix(dest=/tmp/venvoyf919ht, clear=False, no_vcs_ignore=False, global=True)\n  seeder FromAppData(download=False, pip=bundle, wheel=bundle, setuptools=bundle, via=copy, app_data_dir=/home/airflow/.local/share/virtualenv)\n    added seed packages: pip==20.2.4, setuptools==50.3.2, wheel==0.35.1\n  activators PythonActivator,FishActivator,XonshActivator,CShellActivator,PowerShellActivator,BashActivator\n'
[2021-06-30 09:14:19,831] {python_operator.py:316} INFO - Executing cmd
['/tmp/venvoyf919ht/bin/pip', 'install', 'gcsfs>=0.2.0papermill']
[2021-06-30 09:14:27,079] {python_operator.py:321} INFO - Got output
b'Requirement already satisfied: gcsfs>=0.2.0papermill in /opt/python3.6/lib/python3.6/site-packages (2021.6.1)\nRequirement already satisfied: aiohttp in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (3.7.4.post0)\nRequirement already satisfied: fsspec==2021.06.1 in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (2021.6.1)\nRequirement already satisfied: google-auth>=1.2 in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (1.24.0)\nRequirement already satisfied: google-auth-oauthlib in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (0.4.2)\nRequirement already satisfied: requests in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (2.25.0)\nRequirement already satisfied: decorator in /opt/python3.6/lib/python3.6/site-packages (from gcsfs>=0.2.0papermill) (5.0.9)\nRequirement already satisfied: yarl<2.0,>=1.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (1.6.3)\nRequirement already satisfied: chardet<5.0,>=2.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (3.0.4)\nRequirement already satisfied: async-timeout<4.0,>=3.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (3.0.1)\nRequirement already satisfied: typing-extensions>=3.6.5 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (3.7.4.3)\nRequirement already satisfied: attrs>=17.3.0 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (20.3.0)\nRequirement already satisfied: idna-ssl>=1.0; python_version < "3.7" in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (1.1.0)\nRequirement already satisfied: multidict<7.0,>=4.5 in /opt/python3.6/lib/python3.6/site-packages (from aiohttp->gcsfs>=0.2.0papermill) (5.1.0)\nRequirement already satisfied: setuptools>=40.3.0 in /tmp/venvoyf919ht/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (50.3.2)\nRequirement already satisfied: rsa<5,>=3.1.4; python_version >= "3.6" in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (4.6)\nRequirement already satisfied: pyasn1-modules>=0.2.1 in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (0.2.8)\nRequirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (4.1.1)\nRequirement already satisfied: six>=1.9.0 in /opt/python3.6/lib/python3.6/site-packages (from google-auth>=1.2->gcsfs>=0.2.0papermill) (1.15.0)\nRequirement already satisfied: requests-oauthlib>=0.7.0 in /opt/python3.6/lib/python3.6/site-packages (from google-auth-oauthlib->gcsfs>=0.2.0papermill) (1.3.0)\nRequirement already satisfied: idna<3,>=2.5 in /opt/python3.6/lib/python3.6/site-packages (from requests->gcsfs>=0.2.0papermill) (2.8)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/python3.6/lib/python3.6/site-packages (from requests->gcsfs>=0.2.0papermill) (2020.11.8)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/python3.6/lib/python3.6/site-packages (from requests->gcsfs>=0.2.0papermill) (1.25.11)\nRequirement already satisfied: pyasn1>=0.1.3 in /opt/python3.6/lib/python3.6/site-packages (from rsa<5,>=3.1.4; python_version >= "3.6"->google-auth>=1.2->gcsfs>=0.2.0papermill) (0.4.8)\nRequirement already satisfied: oauthlib>=3.0.0 in /opt/python3.6/lib/python3.6/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib->gcsfs>=0.2.0papermill) (3.1.0)\n'
[2021-06-30 09:14:27,200] {python_operator.py:316} INFO - Executing cmd
['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']
[2021-06-30 09:14:28,919] {python_operator.py:323} INFO - Got error output
b'Input notebook does not contain a cell with tag \'parameters\'\n\rExecuting:   0%|          | 0/4 [00:00<?, ?cell/s]Traceback (most recent call last):\n  File "/tmp/venvoyf919ht/script.py", line 16, in <module>\n    res = getGCSObjects(*args, **kwargs)\n  File "/tmp/venvoyf919ht/script.py", line 13, in getGCSObjects\n    parameters=dict(alpha=0.6, ratio=0.1)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/execute.py", line 118, in execute_notebook\n    **engine_kwargs\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 49, in execute_notebook_with_engine\n    return self.get_engine(engine_name).execute_notebook(nb, kernel_name, **kwargs)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 341, in execute_notebook\n    nb_man.notebook_start()\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 69, in wrapper\n    return func(self, *args, **kwargs)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 198, in notebook_start\n    self.save()\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 69, in wrapper\n    return func(self, *args, **kwargs)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/engines.py", line 139, in save\n    write_ipynb(self.nb, self.output_path)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/iorw.py", line 397, in write_ipynb\n    papermill_io.write(nbformat.writes(nb), path)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/iorw.py", line 128, in write\n    return self.get_handler(path).write(buf, path)\n  File "/opt/python3.6/lib/python3.6/site-packages/papermill/iorw.py", line 316, in write\n    multiplier=self.RETRY_MULTIPLIER, min=self.RETRY_DELAY, max=self.RETRY_MAX_DELAY\nTypeError: __init__() got an unexpected keyword argument \'min\'\n\rExecuting:   0%|          | 0/4 [00:00<?, ?cell/s]\n'
[2021-06-30 09:14:28,970] {taskinstance.py:1152} ERROR - Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 985, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 307, in execute_callable
    string_args_filename))
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 319, in _execute_in_subprocess
    close_fds=True)
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
[2021-06-30 09:14:28,970] {taskinstance.py:1152} ERROR - Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 985, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 307, in execute_callable
    string_args_filename))
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 319, in _execute_in_subprocess
    close_fds=True)
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.
[2021-06-30 09:14:28,974] {taskinstance.py:1196} INFO - Marking task as FAILED. dag_id=papermill_run_notebook_v0.1, task_id=list_gcs_files, execution_date=20210630T000000, start_date=20210630T091417, end_date=20210630T091428
[2021-06-30 09:14:28,974] {taskinstance.py:1196} INFO - Marking task as FAILED. dag_id=papermill_run_notebook_v0.1, task_id=list_gcs_files, execution_date=20210630T000000, start_date=20210630T091417, end_date=20210630T091428
Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/usr/local/lib/airflow/airflow/bin/airflow", line 37, in <module>
    args.func(args)
  File "/usr/local/lib/airflow/airflow/utils/cli.py", line 233, in wrapper
    func(args)
  File "/usr/local/lib/airflow/airflow/utils/cli.py", line 81, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/airflow/airflow/bin/cli.py", line 814, in test
    ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
  File "/usr/local/lib/airflow/airflow/utils/db.py", line 74, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 1109, in run
    session=session)
  File "/usr/local/lib/airflow/airflow/utils/db.py", line 70, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 985, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 307, in execute_callable
    string_args_filename))
  File "/usr/local/lib/airflow/airflow/operators/python_operator.py", line 319, in _execute_in_subprocess
    close_fds=True)
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/opt/python3.6/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/tmp/venvoyf919ht/bin/python', '/tmp/venvoyf919ht/script.py', '/tmp/venvoyf919ht/script.in', '/tmp/venvoyf919ht/script.out', '/tmp/venvoyf919ht/string_args.txt']' returned non-zero exit status 1.

ERROR: (gcloud.composer.environments.run) kubectl returned non-zero status code.

Any help will be appreciated, Thanks.

  • What pip version are you using? Did you try to reinstall pip? Is it possible to share full code and `iorw.py` file? Do you have `import os` ? – PjoterS Jun 30 '21 at 12:28
  • Is it possible to add also add.ipynb or add_out.ipynb – PjoterS Jun 30 '21 at 13:06
  • Seems this is a duplicate of https://stackoverflow.com/questions/60748507/airflow-error-got-an-unexpected-keyword-argument-min (with solution). – Jarek Potiuk Jun 30 '21 at 14:53
  • @JarekPotiuk I am not getting how that solution is possible in my case, my notebooks are in separate bucket that the dags – Jegath Suresh Jul 01 '21 at 04:44
  • @PjoterS Do you mean I need to create an empty destination file beforehand? I don't think we need to import os since I am not using that anywhere in my code. I am trying to read the notebook directly from GCS and create output file in GCS – Jegath Suresh Jul 01 '21 at 04:46
  • I have no idea - just found out that someone had similar problem and it was apparently related to the way how imports are done. Maybe you can reach out to that person and find out ? – Jarek Potiuk Jul 01 '21 at 06:10
  • I took names from error output. Id like to see whole file where you have declared snippet you shared (based on snipped I assumed it's `add.ipynb`) and replicate this in my env. I've asked about `import op` as `Unexpected keyword argument 'min'` this looks like some libs are missing. – PjoterS Jul 01 '21 at 10:22
  • Did you try solution from [this case](https://stackoverflow.com/questions/60748507/airflow-error-got-an-unexpected-keyword-argument-min) ? – PjoterS Jul 07 '21 at 10:19

1 Answers1

0

I just had something similar like this happen to me. For me the error was coming from invalid paths for the input and output notebooks. When I created a separate folder in the bucket that contains my DAG and moved my notebook there, it worked. You should be able to change the execution block to the something like this;

pm.execute_notebook(
    r"/home/airflow/gcs/notebooks/notebook.ipynb",
    r"/home/airflow/gcs/notebooks/notebook.ipynb",
    parameters=dict(alpha=0.6, ratio=0.1)

Where /home/airflow/gcs/dags contains your DAG, and the you would create the notebooks directory and move your notebook there.

As someone commented, this looks like a duplicate of Airflow Error - got an unexpected keyword argument 'min'. Hopefully this helped explain it a little better, and fixes your problem

ryanf
  • 1
  • 1
  • Hey, thankyou for the reply, I can try this. But what if my notebooks are on another storage bucket? – Jegath Suresh Jul 21 '21 at 03:02
  • I'm not sure you can do that. I believe when you create your composer environment, the bucket that gets created from this gets mapped to the airflow instance, which is why you are only able to use scripts stored in that bucket https://cloud.google.com/composer/docs/concepts/cloud-storage – ryanf Jul 21 '21 at 12:55