Not sure if this is possible or not, but this is what I am trying to do: -
I want to extract out portions (steps) of a function as individual nodes (ok so far), but the catch is I have an iterator on top of steps, which is dependent on some logic on dataset i.e. repeating the same operation (which are independent) on logical partitions of a dataset.
Example code
def single_node(list_of_numbers):
modified_list = [] # to store all output
for x in list_of_numbers: # iteration logic
x+=1 # Step 1
x=str(x) # Step 2
x+="_suffix" # Step 3
modified_list.append(x) # append to final output
return modified_list # return
Context
- In the provided example, suppose currently I have a single node which performs all of the steps.
- So the current pipeline has one node which takes 1 input and returns 1 output.
- As the complexity of my steps increases, I want to expose them as individual nodes. So I create another pipeline with these 3 steps as individual nodes and connected them together. (their input and output)
- But my overall requirement is unchanged, I want to iterate over all values in
list_of_numbers
, and for each element in this list I want to call this new pipeline. Finally I want to merge the output of all run's and generate one output.
Seems somewhat similar to dynamic graph (multiple dynamic instance of a pipeline) which expands based on the dataset.
Additional points to consider,
- My input is a single file. Say I do the partition of dataset based on some logic defined as a node. So this node could have multiple outputs. (the exact count totally depends on the dataset, here the size of list)
- For each output of the data iterator node, I need to "spawn" one pipeline.
- Finally, merge the outputs of all "spawned" pipelines. (this logic could again be defined in a merge node with multiple dynamic inputs).
Is there a way to do this? Thank you!