1

I am new to Flink and have gone through site(s)/examples/blogs to get started. I am struggling with the correct use of operators. Basically I have 2 questions

Question 1: Does Flink support declarative exception handling, I need to handle parse/validate/... errors?

  • Can I use org.apache.flink.runtime.operators.sort.ExceptionHandler or similar to handle errors?
  • or Rich/FlatMap function my best option? If Rich/FlatMap the only option then is there a way to get handle to Stream inside Rich/FlatMap function so Sink(s) could be attached for error processing?

Question 2: Can I conditionally attach different Sink(s)?

  • Based on certain field(s) in keyed split streams I need to select different sink(s), do I split the stream again or use a Rich/FlatMap to handle that?

I am using Flink 1.3.2. Here is the relevant portion of my job

    .....
    .....
    DataStream<String> eventTextStream = env.addSource(messageSource)

    KeyedStream<EventPojo, Tuple> eventPojoStream = eventTextStream
            // parse, transform or enrich
            .flatMap(new MyParseTransformEnrichFunction())
            .assignTimestampsAndWatermarks(new EventAscendingTimestampExtractor())
            .keyBy("eventId");

    // split stream based on eventType as different reduce and windowing functions need to be applied
    SplitStream<EventPojo> splitStream = eventPojoStream
            .split(new EventStreamSplitFunction());

    // need to apply reduce function
    DataStream<EventPojo> event1TypeStream = splitStream.select("event1Type");

    // need to apply reduce function
    DataStream<EventPojo> event2TypeStream = splitStream.select("event2Type");

    // need to apply time based windowing function
    DataStream<EventPojo> event3TypeStream = splitStream.select("event3Type");

    ....
    ....

    env.execute("Event Processing");      

Am I using the correct operators here?

Update 1:

Tried using the ProcessFunction as suggested by @alpinegizmo but that didn't work as it depends upon a keyed stream which I don't have until I parse/validate input. I get "InvalidProgramException: Field expression must be equal to '*' or '_' for non-composite types. ".

It's such a common use case where your first parse/validate input and won't have keyed stream yet, so how do you solve it?

Thanks for your patience and help.

Aurvoir
  • 267
  • 3
  • 12
  • I think `org.apache.flink.runtime.operators.sort.ExceptionHandler` is for BatchTask. And for streaming, I have seen any handler to process global exceptions. Your can simply add split operator after the keyed streaming operator, so result will be sent to different sink. – BrightFlow Oct 26 '17 at 23:14
  • Thanks David, yes split stream can handle Question 2 but not 1, however based on @alpinegizmo recommendation I find 'side outputs' to be more flexible but it requires a keyed stream upfront – Aurvoir Oct 27 '17 at 17:54

1 Answers1

2

There's one key building block that you've overlooked. Take a look at side outputs.

This mechanism provides a typesafe way to produce any number of additional output streams. This can be a clean way to report errors, among other uses. In Flink 1.3 side outputs can only be used with ProcessFunction, but 1.4 will add side outputs to ProcessWindowFunction.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • Thanks, ProcessFunction definitely is more flexible than FlatMap, however it need KeyedStream that I don't have access to before my first flatMap. That's where I parse xml string and convert to Pojo and later key on one of the field. I guess I can 'mock' the key as whole xml string to satisfy the need for ProcessFunction. Would that be fine? – Aurvoir Oct 26 '17 at 14:50
  • It appears I can't 'mock' the key as I get "InvalidProgramException: Field expression must be equal to '*' or '_' for non-composite types. ". – Aurvoir Oct 27 '17 at 17:45