11

In a single task within the stream app, does the following two methods run independently (meaning while the method "process" is handling an incoming message from the upstream source, the method "punctuate" can also run in parallel based on the specified schedule and WALL_CLOCK_TIME as the PunctuationType?) OR do they share same thread so it's either one that runs at a given time, if so would the punctuate method never gets invoked if the process method keeps continuously getting messages from the upstream source?

  • Processor.process(K key, V value)
    Process the record with the given key and value.

  • ProcessorContext.schedule(long interval, PunctuationType type, Punctuator callback)
    Schedules a periodic operation for processors.

Also, please clarify what does it mean by partition id value being -1 in punctuate method. Is punctuate method not specific to any partition?

  • int ProcessorContext.partition()
    Returns the partition id of the current input record; could be -1 if it is not available (for example, if this method is invoked from the punctuate call)
Raman
  • 665
  • 1
  • 15
  • 38

1 Answers1

16

Both methods are executed in a single thread. Wall-clock based punctuate() will be called independently if there is input data or not: Between calls to process() the thread checks the system time and calls punctuate() if necessary.

For the partition information: yes, punctuations are independent of partitions. Of course, punctuations are specific to a task, however, a task might have multiple input partitions (for example, if it executes a merge or join) so it's unclear what partition information to pass in. For simplicity, single partition case is treated the same way as multi-partition case and punctuations are decouples from partitions.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Hi Matthew, thanks. Let's say the stream app doesn't have a merge or join and in a topic with 6 partitions, if punctuate() is invoked and there if I print the context.TaskId(), it reflects with single task 0 (left) and the respective partition on the right (0_1; 0_2; 0_3; 0_4; 0_5; 0_6). Is it valid to use the TaskId method to determine which partition the task corresponds to in the punctuate? – Raman Jun 10 '18 at 21:44
  • 1
    In this setting and in the current implementation the second number of the task id, is the same as the partition number. However, this naming schema is **not** part of a public contract and could change in future versions without announcement. Thus, it's **not recommended** to write code that relies on this, because it might brake if you upgrade to a newer version! – Matthias J. Sax Jun 11 '18 at 02:32