I have a solid that needs to run after 2 solids. One will return a value, another doesn't return anything but has dependency solids and will take time to run.
I execute the pipeline in multiprocessing
mode, where solids run at the same time if they don't have dependencies defined.
Below is the sample situation I am looking for. Say I have below solids.
@solid(input_defs=[InputDefinition("start", Nothing)])
def solid_a(context):
import time
time.sleep(2)
context.log.info('yey')
@solid
def solid_b(context):
return 1
@composite_solid
def my_composite_solid(wait_solid_a: Nothing, solid_b_output: int):
some_other_solid(solid_b_output)
And when executed, these solids will be running in the below timeline.
Time Passed | solid |
---|---|
0 | pipeline starts... |
1 sec | solid_b started |
3 sec | solid_a dependency solids are running. solid_a did not started yet. |
5 sec | solid_b finished |
10 sec | solid_a started now |
15 sec | solid_a finished |
20 sec | my_composite_solid should start now. |
So, according to this timeline, in order for my_composite_solid
to start, I need both solid_a
and solid_b
to finish executing. However, when I make this, dagster throws an error saying:
dagster.core.errors.DagsterInvalidDefinitionError: @composite_solid 'my_composite_solid' has unmapped input 'wait_solid_a'. Remove it or pass it to the appropriate solid invocation.
If I don't put the solid_a
output as a dependency to my_composite_solid
, it will start immediately after the result of solid_b
. What should I do?