I am trying to build a Dataflow pipeline in python. The main input stream is coming from Pub/Sub and the main processing function takes a side input that is updated from a Pub/Sub stream fairly irregularly. I have written the following code to test my design:
def print_and_return(x):
print('---DEBUG---: ' + str(x))
return x
def comb(x, y):
return f'Main input: {x}, side input: {y}'
def load_side_input(pubsub_message):
import json
message = pubsub_message.decode("utf8")
side_input = json.loads(message)
return ('side', side_input)
def run(input_subscription, side_input_sub, pipeline_args=None):
pipeline_options = PipelineOptions(
pipeline_args, streaming=True, save_main_session=True
)
with Pipeline(options=pipeline_options) as pipeline:
side_input = (
pipeline
| "Side impulse" >> io.ReadFromPubSub(subscription=side_input_sub)
| "Window side" >> WindowInto(window.GlobalWindows(), trigger=trigger.Repeatedly(trigger.AfterCount(1)),
accumulation_mode=trigger.AccumulationMode.DISCARDING)
| "Parse side input" >> Map(load_side_input)
)
(
pipeline
| "Read from Pub/Sub" >> io.ReadFromPubSub(subscription=input_subscription, with_attributes=True)
| "Window" >> WindowInto(window.FixedWindows(10))
| "Add sideinput" >> Map(comb, y=pvalue.AsDict(side_input))
| "Print" >> Map(print_and_return)
)
I run it locally to test in debug mode, the load_side_input
function triggers (I know because if I put a break point in it it gets hit) but the rest (comb
and print_and_return
) don't. My understanding is the the FixedWindow should trigger every 10 seconds on the main input and beam would match that window with the last firing of the trigger on the side input since it's in a global window, but in fact nothing happens.
What am I missing, why is there no output?
EDIT:
After days of trying out things and even asking on the beam users mailing group I had the idea that it might just be the local runner acting up, and sure thing, after deploying to Dataflow the pipeline works as expected. It's annoying to test this way though so it would still be nice to know what the problem is and how to solve it.