Execute egg directly from Azure Data Factory

Question

Question

How to execute egg file from Azure Data Factory (AD) pipeline? Currently I'm able only to call Databricks notebook from where executing egg file. Any way to do that directly?

What have been done

Following this answer, I got the following exception:

{
    "errorCode": "3201",
    "message": "Must specify one jar or maven library for jar task, either via jar_uri or libraries.",
    "failureType": "UserError",
    "target": "Execute Egg",
    "details": []
}

Code and structure

On my local machine I can execute python dist/hello_world-1.0-py2.7.egg, that will print 'Hello world!'

src
 |-__init__.py
 |-main.py
__main__.py
setup.py

setup.py

from setuptools import setup, find_packages

setup(
    name='hello-world',
    version='1.0',
    packages=find_packages(),
    py_modules=['__main__']
)

__main_ _.py

from src.main import run

if __name__ == '__main__':
    run()

src/main.py

def run():
    print('Hello world!')


if __name__ == '__main__':
    run()

https://learn.microsoft.com/en-us/azure/data-factory/data-factory-troubleshoot-guide — NicoNing, Mar 31 '20 at 12:43
What about defining `class Main: @classmethod def main(cls): return run()` and then specifying `Main` as the *Main class name*? — a_guest, Mar 31 '20 at 14:45
@a_guest it doesn't work. Looks like my egg get executable only when in the root there is a `__main__.py` file with `if __name__ == '__main__':` section. If I replace this code with a class, I can't run `python dist/hello_world-1.0-py2.7.egg` — VB_, Mar 31 '20 at 21:51
@NicoNing Message `Must specify one jar...` for 3201 is not documented at the link you've provided. I even can't find a proof that is possible at ADF. — VB_, Mar 31 '20 at 22:08
@VB_ I meant to include that class into `__main__.py` and then use it as `if __name__ == '__main__': Main.main()`. — a_guest, Apr 01 '20 at 18:37
@a_guest how it'll change the situation? The current problem is that ADF can't execute `__main__.py:__main__` method. What you want to specify at `Main class name` field? — VB_, Apr 01 '20 at 21:21
@VB_ What's the purpose of the "Main class name" field then? I'm not familiar with ADF but it sounds like it wants to `from ... import Main` and then execute it. What else would "Main class name" refer to? — a_guest, Apr 02 '20 at 19:13

score 1 · Answer 1 · answered Jul 13 '20 at 05:45

It seems you selected Jar activity in Azure Data Factory, instead of Python activity.

In the Jar activity, the "Main class name" expects full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library.

If you select Python activity, you can specify Python file name and upload your egg library.

You can find more details about it here: https://learn.microsoft.com/en-us/azure/data-factory/transform-data-databricks-python

Execute egg directly from Azure Data Factory

Question

What have been done

Code and structure

1 Answers1