Dagster start pipeline from another pipeline using its outputs

Question

How am I supposed to start a pipeline B after pipeline A completes, and use pipeline A's outputs into pipeline B?

A piece of code as a starting point:

from dagster import InputDefinition, Nothing, OutputDefinition, pipeline, solid

@solid
def pipeline1_task1(context) -> Nothing:
    context.log.info('in pipeline 1 task 1')


@solid(input_defs=[InputDefinition("start", Nothing)],
       output_defs=[OutputDefinition(str, 'some_str')])
def pipeline1_task2(context) -> str:
    context.log.info('in pipeline 1 task 2')
    return 'my cool output'


@pipeline
def pipeline1():
    pipeline1_task2(pipeline1_task1())


@solid(input_defs=[InputDefinition("print_str", str)])
def pipeline2_task1(context, print_str) -> Nothing:
    context.log.info('in pipeline 2 task 1' + print_str)


@solid(input_defs=[InputDefinition("start", Nothing)])
def pipeline2_task2(context) -> Nothing:
    context.log.info('in pipeline 2 task 2')


@pipeline
def pipeline2():
    pipeline2_task2(pipeline2_task1())


if __name__ == '__main__':
    # run pipeline 1
    # store outputs
    # call pipeline 2 using the above outputs

Here we have three pipelines: pipeline1 has two solids, possibly does whatever stuff we wish and returns output from the second solid. pipeline2 is supposed to use the output of pipeline1_task2, eventually do another piece of work and print the output of the first pipeline.

How am I supposed to "connect" the two pipelines?

score 1 · Answer 1 · answered Mar 19 '21 at 00:25

One way to make one pipeline execute after another one is via a sensor. The recommended way to do this in Dagster is with an "asset sensor". A solid in the first pipeline yields an AssetMaterialization, and the sensor in the second pipeline waits for that asset to be materialized.

Here's an example: https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors#asset-sensors

score 0 · Answer 2 · answered Mar 17 '21 at 11:59

After playing around a bit, I came to the following solution (not too elegant in my opinion, but at least it works):

from dagster import (InputDefinition, OutputDefinition,
                     execute_pipeline, pipeline, solid, Nothing, repository)


@solid
def pipeline1_task1(context) -> Nothing:
    context.log.info('in pipeline 1 task 1')


@solid(input_defs=[InputDefinition("start", Nothing)],
output_defs=[OutputDefinition(str, 'some_str')])
def pipeline1_task2(context) -> str:
    context.log.info('in pipeline 1 task 2')
    return '\n\n\nmy cool output\n\n\n'


@pipeline
def pipeline1():
    pipeline1_task2(pipeline1_task1())


@solid(input_defs=[InputDefinition("print_str", str)])
def pipeline2_task1(context, print_str) -> Nothing:
    context.log.info('in pipeline 2 task 1' + print_str)


@solid(input_defs=[InputDefinition("start", Nothing)])
def pipeline2_task2(context) -> Nothing:
    context.log.info('in pipeline 2 task 2')


@pipeline
def pipeline2():
    pipeline2_task2(pipeline2_task1())


@solid
def run_pipelines(context):
    pout = execute_pipeline(pipeline1)
    some_str = pout.result_for_solid('pipeline1_task2')
    conf = {'solids': {'pipeline2_task1': {'inputs': {'print_str': some_str.output_value('some_str')}}}}
    execute_pipeline(pipeline2, run_config=conf)

@pipeline
def pipeline3():
    run_pipelines()


@repository
def repo():
    return [pipeline1, pipeline2, pipeline3]

if __name__ == '__main__':
    execute_pipeline(pipeline3)

So... here I've defined pipeline3 instead of doing everything in the bottom conditional. Pipeline 3 has only one solid, which executes pipeline1 and gets the output of the solid pipeline1_task2. It then creates a configuration that includes that output, some_str, and passes this configuration to the execute_pipeline for the second pipeline.

Here, we have also defined an @repository function, which is necessary for Dagster to figure out that all three pipelines are part of a whole.

The whole thing visualizes nicely in dagit. Although each pipeline is shown separately from the others, the three are shown in one repository (as defined in the code).

Dagster start pipeline from another pipeline using its outputs

2 Answers2