Questions tagged [apache-kafka-streams]

Related to Apache Kafka's built-in stream processing engine called Kafka Streams, which is a Java library for building distributed stream processing apps using Apache Kafka.

Kafka Streams is a Java library for building fault-tolerant distributed stream processing applications using streams of data records from topics in Apache Kafka.

Kafka Streams is a library for building streaming applications, specifically applications that transform input Kafka topics into output Kafka topics (or calls to external services, or updates to databases, or whatever). It lets you do this with concise code in a way that is distributed and fault-tolerant.

Documentation: https://kafka.apache.org/documentation/streams/

3924 questions
14
votes
1 answer

Print Kafka Stream Input out to console?

I've been looking through a lot of the Kafka documentation for a java application that I am working on. I've tried getting into the lambda syntax introduced in Java 8, but I am a little sketchy on that ground and don't feel too confident that it…
Zeliax
  • 4,987
  • 10
  • 51
  • 79
13
votes
1 answer

Kafka - Stream vs Topic

What is the difference between Kafka topic and stream? I was thinking both were same. This doc says that create stream from a topic which caused the confusion. https://docs.ksqldb.io/en/latest/developer-guide/create-a-stream/ Questions: What is…
RamPrakash
  • 2,218
  • 3
  • 25
  • 54
13
votes
1 answer

max.poll.intervals.ms set to int.Max by default

Apache Kafka documentation states: The internal Kafka Streams consumer max.poll.interval.ms default value was changed from 300000 to Integer.MAX_VALUE Since this value is used to detect when the processing time for a batch of records exceeds a…
Javier Holguera
  • 1,301
  • 2
  • 11
  • 27
13
votes
2 answers

How to connect to multiple clusters in a single Kafka Streams application?

In the Kafka Streams Developer Guide it says: Kafka Streams applications can only communicate with a single Kafka cluster specified by this config value. Future versions of Kafka Streams will support connecting to different Kafka clusters for…
mixiul__
  • 395
  • 1
  • 2
  • 12
13
votes
3 answers

Is Kafka Stream StateStore global over all instances or just local?

In Kafka Stream WordCount example, it uses StateStore to store word counts. If there are multiple instances in the same consumer group, the StateStore is global to the group, or just local to an consumer instance? Thnaks
Stephen Kuo
  • 1,175
  • 3
  • 11
  • 19
12
votes
1 answer

Consumer group stuck in 'rebalancing' even though there are no consumers

I am using kafka version 2.4.1(recently upgraded to 2.4.1 from 2.2.0) and noticed a strange problem. Even though application(kafka streams) is down (there is no application which is running ) but the consumer group command returns the state as…
SunilS
  • 2,030
  • 5
  • 34
  • 62
12
votes
2 answers

rocksdb out of memory

I'm trying to find out why my kafka-streams application runs out of memory. I already found out that rocksDB is consuming lots of native memory and I tried to restrict it with the following configuration: # put index and filter blocks in blockCache…
D-rk
  • 5,513
  • 1
  • 37
  • 55
12
votes
1 answer

What, exactly happens when a repartition occurs in a kafka stream?

Say I have a stream of employees, keyed by empId, which also includes departmentId. I want to aggregate by department. So I do a selectKey(mapper to get departmentId), then groupByKey() (or I could just do a a groupBy(...), I assume), and then,…
mconner
  • 1,174
  • 3
  • 12
  • 24
12
votes
1 answer

Kafka Connect vs Streams for Sinks

I am trying to understand what Connect buys you that Streams does not. We have a part of our application where we want to consume a topic and write to mariadb. I could accomplish this with a simple processor. Read the record, store in state store…
Chris
  • 1,299
  • 3
  • 18
  • 34
12
votes
2 answers

Apache Beam over Apache Kafka Stream processing

What are the differences between Apache Beam and Apache Kafka with respect to Stream processing? I am trying to grasp the technical and programmatic differences as well. Please help me understand by reporting from your experience.
Stella
  • 1,728
  • 5
  • 41
  • 95
12
votes
2 answers

In-memory vs persistent state stores in Kafka Streams?

I've read the stateful stream processing overview and if I understand correctly, one of the main reasons why the RocksDB is being used as a default implementation of the key value store is a fact, that unlike in-memory collections, it can handle…
Dth
  • 1,916
  • 3
  • 23
  • 34
12
votes
3 answers

Kafka Streams - Send on different topics depending on Streams Data

I have a kafka streams application waiting for records to be published on topic user_activity. It will receive json data and depending on the value of against a key I want to push that stream into different topics. This is my streams App…
el323
  • 2,760
  • 10
  • 45
  • 80
12
votes
1 answer

Kafka stream join

I have 2 kafka topics - recommendations and clicks. The first topic has recommendations object keyed by a unique Id (called recommendationsId). Each product has a URL which the user can click. The clicks topic gets the messages generated by clicks…
Nik
  • 5,515
  • 14
  • 49
  • 75
12
votes
3 answers

Ideal way to enrich a KStream with lookup data

My stream has a column called 'category' and I have additional static metadata for each 'category' in a different store, it gets updated once every couple of days. What is the right way to do this lookup? There are two options with Kafka…
Vignesh Chandramohan
  • 1,306
  • 10
  • 15
12
votes
1 answer

Merging multiple identical Kafka Streams topics

I have 2 Kafka topics streaming the exact same content from different sources so I can have high availability in case one of the sources fails. I'm attempting to merge the 2 topics into 1 output topic using Kafka Streams 0.10.1.0 such that I don't…
Bogdan
  • 312
  • 7
  • 16