Questions tagged [stream-processing]
272 questions
4
votes
1 answer
Is RethinkDB a good fit for a generic Real-time aggregation platform?
I need your help to verify if RethinkDB fits my use case.
Use case
My team is building a generic Real-time aggregation platform which needs to:
join data from a lot of Kafka topics
Joins need to be done on raw data
Topics have the same key
Data in…

davorp
- 4,156
- 3
- 26
- 34
4
votes
2 answers
Apache Flink State Store vs Kafka Streams
As far as I know handles Kafka Streams its States localy in memory or on disc or in a Kafka topic because all the input date is from a partition, where all the messages are keyed by a defined value. Most of the time the computations can be done…

str0yd
- 97
- 11
4
votes
2 answers
How Flink and Beam SDKs handle windowing - Which is more efficient?
I am comparing the Apache Beam SDK with the Flink SDK for stream processing, in order to establish the cost/advantages of using Beam as an additional framework.
I have a very simple setup where a stream of data is read from a Kafka source and…

javalass
- 143
- 9
4
votes
2 answers
Bootstrap flink state
I am working on a simple aggregation that sums totals of events happening on a given resource (see:
Calculate totals and emit periodically in flink). With some help I got this to work, but am now hitting another issue.
I am trying to calculate…

Dalibor Novak
- 575
- 5
- 17
4
votes
3 answers
jq streaming - filter nested list and retain global structure
In a large json file, I want to remove some elements from a nested list, but keep the overall structure of the document.
My example input it this (but the real one is large enough to demand streaming).
{
"keep_untouched": {
"keep_this": [
…

Pete C
- 489
- 4
- 8
4
votes
2 answers
Camel-Kafka component not workingfor error :"because of Brokers must be configured"
Got an error using kafka component for Apache Camel (version 2.19.1),i'm just trying to print incoming messages in topic, my pipeline is so composed:
...
context.addRoutes(new RouteBuilder() {
public void configure() {
…

Giuseppe
- 363
- 5
- 19
4
votes
2 answers
How to "rate-limit" a PCollection in Apache Beam?
I have what seems to be a common problem but I can't figure out what the Beam recommended solution is.
I have a stream of raw events and I'm looking for two separate events to fulfill a condition within a sliding window (of 60 minutes) for it to…

nambrot
- 2,551
- 2
- 19
- 31
4
votes
0 answers
akka-stream Zipping Flows with SubFlows
I've a short question about akka-streams.
Basically, I try to split a stream into two streams, one of these two streams will be split again in multiple subFlows using groupBy, each of these subFlows needs to be connected with the other stream…

Cem Philipp Freimoser
- 699
- 5
- 19
4
votes
2 answers
Parallelism behaviour of stream processing engines
I have been learning Storm and Samza in order to understand how stream processing engines work and realized that both of them are standalone applications and in order to process an event I need to add it to a queue that is also connected to stream…

Boyolame
- 329
- 2
- 13
4
votes
3 answers
What are some practical problems that parallel computing, f#, and GPU-parallel processing might solve
Recently WiFi encryption was brute forced by using the parellel processing power of the modern GPU. What other real-life problems do you think will benefit from similar techniques?

Chris Ballance
- 33,810
- 26
- 104
- 151
3
votes
3 answers
Programming models of different hardware
I'm really not sure if this is the right place to ask. I'm interested in the different programming models of different types of hardware.
It starts off like this, I was presenting some work I was doing w/ NVIDIA CUDA. I was telling people that one…

sj755
- 3,944
- 14
- 59
- 79
3
votes
1 answer
One consumer to multiple tables or many consumers per table
I have a kafka topic with millions of sale events. I have a consumer which on every message will insert the data into 4 table:
1 for the raw sales,
1 for the sales sum by date by product category (date, product_category, sale_sum)
1 for the sales…

friartuck
- 2,954
- 4
- 33
- 67
3
votes
3 answers
How to do stream processing with Redpanda?
Redpanda seems easy to work with, but how would one process streams in real-time?
We have a few thousand IoT devices that send us data every second. We would like to get the running average of the data from the last hour for each of the devices. Can…

NorwegianClassic
- 935
- 8
- 26
3
votes
0 answers
using Java Stream without the filter() operation to block the stream
I'm working on a service that needs to make some stream processing for products.
Given a Company we can use getProducts(Company company) to get List.
The next thing I'd like to do is to filter that list. For each product I make a query to a…

IsaacLevon
- 2,260
- 4
- 41
- 83
3
votes
2 answers
wordcount test shows slowness in Flink
i am doing some benchmark comparison between streaming processing frameworks,
I selected WordCount such "Hello world" task (with some twists) in this area, and tested Flink and Hazelcast Jet so far, the result is Flink is taking 80+s to complete,…

Kuawiiii
- 151
- 1
- 8