5

Question

How to execute egg file from Azure Data Factory (AD) pipeline? Currently I'm able only to call Databricks notebook from where executing egg file. Any way to do that directly?

What have been done

Following this answer, I got the following exception:

{
    "errorCode": "3201",
    "message": "Must specify one jar or maven library for jar task, either via jar_uri or libraries.",
    "failureType": "UserError",
    "target": "Execute Egg",
    "details": []
}

enter image description here

Code and structure

On my local machine I can execute python dist/hello_world-1.0-py2.7.egg, that will print 'Hello world!'

src
 |-__init__.py
 |-main.py
__main__.py
setup.py

setup.py

from setuptools import setup, find_packages

setup(
    name='hello-world',
    version='1.0',
    packages=find_packages(),
    py_modules=['__main__']
)

__main_ _.py

from src.main import run

if __name__ == '__main__':
    run()

src/main.py

def run():
    print('Hello world!')


if __name__ == '__main__':
    run()
silent
  • 14,494
  • 4
  • 46
  • 86
VB_
  • 45,112
  • 42
  • 145
  • 293
  • https://learn.microsoft.com/en-us/azure/data-factory/data-factory-troubleshoot-guide – NicoNing Mar 31 '20 at 12:43
  • What about defining `class Main: @classmethod def main(cls): return run()` and then specifying `Main` as the *Main class name*? – a_guest Mar 31 '20 at 14:45
  • @a_guest it doesn't work. Looks like my egg get executable only when in the root there is a `__main__.py` file with `if __name__ == '__main__':` section. If I replace this code with a class, I can't run `python dist/hello_world-1.0-py2.7.egg` – VB_ Mar 31 '20 at 21:51
  • @NicoNing Message `Must specify one jar...` for 3201 is not documented at the link you've provided. I even can't find a proof that is possible at ADF. – VB_ Mar 31 '20 at 22:08
  • @VB_ I meant to include that class into `__main__.py` and then use it as `if __name__ == '__main__': Main.main()`. – a_guest Apr 01 '20 at 18:37
  • @a_guest how it'll change the situation? The current problem is that ADF can't execute `__main__.py:__main__` method. What you want to specify at `Main class name` field? – VB_ Apr 01 '20 at 21:21
  • @VB_ What's the purpose of the "Main class name" field then? I'm not familiar with ADF but it sounds like it wants to `from ... import Main` and then execute it. What else would "Main class name" refer to? – a_guest Apr 02 '20 at 19:13

1 Answers1

1

It seems you selected Jar activity in Azure Data Factory, instead of Python activity.

Databricks activities in Azure Data Factory

In the Jar activity, the "Main class name" expects full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library.

If you select Python activity, you can specify Python file name and upload your egg library.

enter image description here

You can find more details about it here: https://learn.microsoft.com/en-us/azure/data-factory/transform-data-databricks-python

Valdas M
  • 113
  • 6