Questions tagged [stream-processing]
272 questions
7
votes
1 answer
Flink Windows Boundaries, Watermark, Event Timestamp & Processing Time
Problem Definition & Establishing Concepts
Let’s say we have a TumblingEventTimeWindow with size 5 minutes. And we have events containing 2 basic pieces of information:
number
event timestamp
In this example, we kick off our Flink topology at…

samser
- 83
- 6
7
votes
1 answer
Flink window state size and state management
After reading flink's documentation and searching around, i couldn't entirely understand how flink's handles state in its windows.
Lets say i have an hourly tumbling window with an aggregation function that accumulate msgs into some java pojo or…

yaarix
- 490
- 7
- 18
7
votes
2 answers
Stream processing architecture
I am in the process of designing a system where there's a main stream of objects and there are multiple workers which produces some result from that object. Finally, there is some special/unique worker (sort of a "sink", in terms of graph theory)…

IsaacLevon
- 2,260
- 4
- 41
- 83
7
votes
1 answer
Kafka Streams Sort Within Processing Time Window
I wonder if there's any way to sort records within a window using Kafka Streams DSL or Processor API.
Imagine the following situation as an example (arbitrary one, but similar to what I need):
There is a Kafka topic of some events, let's say user…

burdiyan
- 315
- 2
- 12
7
votes
1 answer
Apache Apex vs Apache Flink
As both are streaming frameworks which processes event at a time, What are the core architectural differences between these two technologies/streaming framework?
Also, what are some particular use cases where one is more appropriate than the other?

Biplob Biswas
- 1,761
- 19
- 33
7
votes
2 answers
Lazily extract lines from large file
I'm trying to grab 5 lines by their line numbers from a large (> 1GB) file with Clojure. I'm almost there but am seeing some strange things, and I want to understand what's going on.
So far I've got:
(defn multi-nth [values indices]
(map (partial…

David J.
- 6,546
- 1
- 19
- 14
6
votes
1 answer
efficiently store a result stream in multiple tables with optimistic locking per item
Given a result stream with a lot of items I want to store them and handle potential concurrency conflicts:
public void onTriggerEvent(/* params */) {
Stream results = customThreadPool.submit(/*...complex parallel computation on multiple…

Stuck
- 11,225
- 11
- 59
- 104
5
votes
1 answer
Unable connect to node with id 1: [Worker]: Error: ConnectionError('No connection to node with id')
I am trying to use robinhood / faust but without success!
I have already created a producer that inserts in the original topic, in my confluent-kafka localhost instance, successfully!
but the faust is unable to connect to localhost.
My…

FelipeAgger
- 51
- 1
- 5
5
votes
2 answers
Streaming: tumbling window vs microbatching
How is tumbling window of 5 secs in stream processing different from microbatch of 5 secs when microbatching? Both have a non-overlapping window of 5 secs during which they process the records and then move on.
I understand that there is this notion…

Sheel Pancholi
- 621
- 11
- 25
5
votes
1 answer
Share state among operators in Flink
I wonder if it is possible in Flink to share the state among operators.
Say, for instance, that I have partitioning by key on an operator and I need a piece of state of partition A inside partition C (for any reason) (fig 1.a), or I need the state…

affo
- 453
- 3
- 15
5
votes
0 answers
Akka-stream UnsupportedOperationException by creating a Source from Graph
I am trying to connect a stream with a n * subFlows. Therefore I build a source from the outlet of a broadcast. But it throws an UnsupportedOperationException: cannot replace the shape of the EmptyModule. I tried to google this exception, but I…

Cem Philipp Freimoser
- 699
- 5
- 19
5
votes
2 answers
How can I write a custom stream transformation in C++?
I'm learning C++ after having worked a lot with Haskell and functional languages in general, and I found that I'm constantly trying to solve the same problem:
Read some data from an input stream
Tokenize them based on a specific algorithm
Process…

Jakub Arnold
- 85,596
- 89
- 230
- 327
5
votes
3 answers
Lamina vs Storm
I am designing a prototype realtime monitor for processing fairly large amounts (>30G/day) of streaming numeric data. I would like to write this in Clojure, as the language seems to be well suited to the kind of "Observer + state machine" system…

CLF
- 155
- 6
4
votes
1 answer
Best Design pattern to create a rules engine
Suppose I have to design a rules engine , where depending on a static configuration rule, the chain of responsibility changes at runtime. What is the best design pattern for implementing this problem?
FOr e,g. depending on some configurations, a set…

Vignesh
- 79
- 2
- 5
4
votes
2 answers
How do I handle out-of-order events with Apache flink?
To test out stream processing and Flink, I have given myself a seemingly simple problem. My Data stream consists of x and y coordinates for a particle along with time t at which the position was recorded. My objective is to annotate this data with…

Optimus
- 2,716
- 4
- 29
- 49