0

I am currently evaluating dagster to build data engineering pipelines. I have to incorporate a huge body of existing c# code for a key piece of data transformation (from object storage to object storage) that I cannot simply replace in python.

Do I understand the dagster documentation correctly, that I should run this step (or op) in a docker container that contains the relevant c# programs and than shell out to them via https://docs.dagster.io/_apidocs/libraries/dagster-shell ?

Panke
  • 156
  • 9

1 Answers1

0

Another approach is to use python subprocess library directly, which seems fine because most of the tools in dagster-shell are built out of python's os library (see the source codes of dagsteter-shell).

import subprocess

from dagster import asset, file_relative_path

# command as string to run on a terminal
@asset
def terminal_cmd(context):
    terminal_cmd_string='''the command'''
    subprocess = subprocess.run(
        terminal_cmd_string,
        capture_output=True, 
        shell=True
    )

    context.log.info(terminal_cmd.stdout.decode()) # display output in dagster logs

# script to run on a terminal
@asset
def terminal_script(context):
    script_path = file_relative_path(__file__,'path/to/the/file/from/this/directory/level')
    subprocess.run(
        [script_path,'arg1','arg2'],
        capture_output=True, 
        shell=True
    )

    context.log.info(terminal_script.stdout.decode())

You can try to add some exception handling inside the asset as well, for instace (a poor's man example):

if terminal_cmd.returncode == 0:
    return None
else:
    raise Exception(
        f'''something went wrong with your command
            return code: {terminal_cmd.returncode}
            stderr: {terminal_cmd.stderr.decode()}
        '''
    )

I have 0 knowledge on C#, but this is how I approach terminal operations with dagster.

Japeusa
  • 31
  • 2
  • Thanks, it's also what I settled on. Currently easy since I have only a local instance running and can make the C# program available on path. – Panke Apr 14 '23 at 06:24