Questions tagged [stream-processing]

272 questions
4
votes
1 answer

Is RethinkDB a good fit for a generic Real-time aggregation platform?

I need your help to verify if RethinkDB fits my use case. Use case My team is building a generic Real-time aggregation platform which needs to: join data from a lot of Kafka topics Joins need to be done on raw data Topics have the same key Data in…
4
votes
2 answers

Apache Flink State Store vs Kafka Streams

As far as I know handles Kafka Streams its States localy in memory or on disc or in a Kafka topic because all the input date is from a partition, where all the messages are keyed by a defined value. Most of the time the computations can be done…
4
votes
2 answers

How Flink and Beam SDKs handle windowing - Which is more efficient?

I am comparing the Apache Beam SDK with the Flink SDK for stream processing, in order to establish the cost/advantages of using Beam as an additional framework. I have a very simple setup where a stream of data is read from a Kafka source and…
4
votes
2 answers

Bootstrap flink state

I am working on a simple aggregation that sums totals of events happening on a given resource (see: Calculate totals and emit periodically in flink). With some help I got this to work, but am now hitting another issue. I am trying to calculate…
Dalibor Novak
  • 575
  • 5
  • 17
4
votes
3 answers

jq streaming - filter nested list and retain global structure

In a large json file, I want to remove some elements from a nested list, but keep the overall structure of the document. My example input it this (but the real one is large enough to demand streaming). { "keep_untouched": { "keep_this": [ …
Pete C
  • 489
  • 4
  • 8
4
votes
2 answers

Camel-Kafka component not workingfor error :"because of Brokers must be configured"

Got an error using kafka component for Apache Camel (version 2.19.1),i'm just trying to print incoming messages in topic, my pipeline is so composed: ... context.addRoutes(new RouteBuilder() { public void configure() { …
Giuseppe
  • 363
  • 5
  • 19
4
votes
2 answers

How to "rate-limit" a PCollection in Apache Beam?

I have what seems to be a common problem but I can't figure out what the Beam recommended solution is. I have a stream of raw events and I'm looking for two separate events to fulfill a condition within a sliding window (of 60 minutes) for it to…
nambrot
  • 2,551
  • 2
  • 19
  • 31
4
votes
0 answers

akka-stream Zipping Flows with SubFlows

I've a short question about akka-streams. Basically, I try to split a stream into two streams, one of these two streams will be split again in multiple subFlows using groupBy, each of these subFlows needs to be connected with the other stream…
4
votes
2 answers

Parallelism behaviour of stream processing engines

I have been learning Storm and Samza in order to understand how stream processing engines work and realized that both of them are standalone applications and in order to process an event I need to add it to a queue that is also connected to stream…
4
votes
3 answers

What are some practical problems that parallel computing, f#, and GPU-parallel processing might solve

Recently WiFi encryption was brute forced by using the parellel processing power of the modern GPU. What other real-life problems do you think will benefit from similar techniques?
Chris Ballance
  • 33,810
  • 26
  • 104
  • 151
3
votes
3 answers

Programming models of different hardware

I'm really not sure if this is the right place to ask. I'm interested in the different programming models of different types of hardware. It starts off like this, I was presenting some work I was doing w/ NVIDIA CUDA. I was telling people that one…
sj755
  • 3,944
  • 14
  • 59
  • 79
3
votes
1 answer

One consumer to multiple tables or many consumers per table

I have a kafka topic with millions of sale events. I have a consumer which on every message will insert the data into 4 table: 1 for the raw sales, 1 for the sales sum by date by product category (date, product_category, sale_sum) 1 for the sales…
friartuck
  • 2,954
  • 4
  • 33
  • 67
3
votes
3 answers

How to do stream processing with Redpanda?

Redpanda seems easy to work with, but how would one process streams in real-time? We have a few thousand IoT devices that send us data every second. We would like to get the running average of the data from the last hour for each of the devices. Can…
NorwegianClassic
  • 935
  • 8
  • 26
3
votes
0 answers

using Java Stream without the filter() operation to block the stream

I'm working on a service that needs to make some stream processing for products. Given a Company we can use getProducts(Company company) to get List. The next thing I'd like to do is to filter that list. For each product I make a query to a…
IsaacLevon
  • 2,260
  • 4
  • 41
  • 83
3
votes
2 answers

wordcount test shows slowness in Flink

i am doing some benchmark comparison between streaming processing frameworks, I selected WordCount such "Hello world" task (with some twists) in this area, and tested Flink and Hazelcast Jet so far, the result is Flink is taking 80+s to complete,…
1 2
3
18 19