1

The goal

I must setup a group id for the kafka stream consumer, that matches a strict naming convention.

I cannot find a way that works after having followed deeply the documentation. As I still believe that I may have misundersood something, I prefer to open a question here for peer-review before opening a bug issue on spring-cloud-stream github repository.

NB:

A similar question was already asked one year ago, but the question is not very exaustive and not answered yet, I hope that I can give more insight to the problem here.

What the official documentation states (and also based on WARN messages)

From several sources of the official documentation, I see that this should be pretty easy to configure in application.yaml of my app.

The documentation states that I can either:

  • use a default value for all the binders, using the section spring.cloud.stream.kafka.default.group=<value>
  • or use a specific value for my channel in spring.cloud.stream.bindings.<channelName>.group

If I setup directly the kafka generic field group-id in spring.kafka.consumer.group-id the parameter is explicitely ignored and I get the following WARN:

2022-08-10 10:18:18.376 [main] [WARN ] [o.s.c.s.b.k.s.p.KafkaStreamsBinderConfigurationProperties] - Ignoring provided value(s) for 'group.id'. Use spring.cloud.stream.default.group or spring.cloud.stream.binding.<name>.group to specify the group instead of group.id

so I have also tried in both the sections spring.cloud.stream.default.group and spring.cloud.stream.binding.<name>.group (note that it is stated here binding and not bindings, without s).

Edit: Based on a comment from @OlegZhurakousky, this is only a typo in the error message. I tested with and without the s without success.

Looking at the code of the library

I have had a quick look at the stream code, and this property seems indeed the one that must be set, such as they are doing in their tests, we can see that they use for example: --spring.cloud.stream.bindings.uppercase-in-0.group=inputGroup .

The problem after following the documentation

The group ID seems always ignored, after testing all the afore mentioned configuration. The group is always set to the default value, which is groupId=process-applicationId.

such as in the logs as follow:

2022-08-10 10:30:56.644 [process-applicationId-c433e54c-2a51-4618-b7a6-14a96b252daf-StreamThread-1] [INFO ] [o.a.k.c.c.i.SubscriptionState] - [Consumer clientId=process-applicationId-c433e54c-2a51-4618-b7a6-14a96b252daf-StreamThread-1-consumer, groupId=process-applicationId] Resetting offset for partition my-custom-topic-0 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[kafka:9092 (id: 1 rack: null)], epoch=0}}.
2022-08-10 10:32:56.713 [process-applicationId-c433e54c-2a51-4618-b7a6-14a96b252daf-StreamThread-1] [INFO ] [o.a.k.s.p.internals.StreamThread] - stream-thread [process-applicationId-c433e54c-2a51-4618-b7a6-14a96b252daf-StreamThread-1] Processed 0 total records, ran 0 punctuators, and committed 0 total tasks since the last update
2022-08-10 10:34:56.767 [process-applicationId-c433e54c-2a51-4618-b7a6-14a96b252daf-StreamThread-1] [INFO ] [o.a.k.s.p.internals.StreamThread] - stream-thread [process-applicationId-c433e54c-2a51-4618-b7a6-14a96b252daf-StreamThread-1] Processed 0 total records, ran 0 punctuators, and committed 0 total tasks since the last update

It is like the application.yaml for group is not used at all. On the other hand, the spring.cloud.stream.bindings.process-in-0.destination=my-custom-topic field that set destination: my-custom-topic is understood and the topic is followed correctly (see the logs above).

How my application is setup

relevant dependencies in pom.xml

        <dependency>
            <groupId>org.springframework.kafka</groupId>
            <artifactId>spring-kafka</artifactId>
            <version>2.8.6</version>
        </dependency>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-streams</artifactId>
            <version>3.1.1</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-stream-binder-kafka-streams</artifactId>
            <version>3.2.4</version>
        </dependency>

kakfa stream consumer class (simplified to include only the relevant sections)


package my.custom.stuff;


import org.apache.kafka.streams.kstream.KStream;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;
import java.util.function.Consumer;

@Component
public class myKafkaStreamConsumer {

    private static final Logger logger = LoggerFactory.getLogger(myKafkaStreamConsumer.class);

    @Bean
    public static Consumer<KStream<String, String>> process() {
        return input ->
                input.foreach((key, value) -> {
                    logger.debug("from STREAM: Key= {} , value = {}", key, value);
                    // ...
                    // my message handling business logic
                    // ...
                });
    }
}

one version of the application.yaml

I put here the version of the application.yaml that IMHO should be the most compliant with the documentation and still is not working, note that the destination is correctly used, so at least it is using the correct channel.

spring:
  kafka:
    bootstrap-servers: kafka:9092
    consumer:
      auto-offset-reset: earliest
  cloud:
    stream:
      bindings:
        process-in-0:
          group: myCustomGroupId
          destination: "my-custom-topic"

What I have already tested (unsuccessfully)

I have tried to inject the group id in several ways, that include:

  • all the possible combinations that I could find in any official documentation or example
  • adding it in the consumer subsection such as in spring.cloud.stream.bindings.process-in-0.consumer.group or spring.cloud.stream.bindings.process-in-0.consumer.group-id
  • injecting the official documented keys as environment variables

It simply seems always ignored.

References

Danduk82
  • 769
  • 1
  • 10
  • 29
  • Have you tried setting the `default` group? Not the `process-in-0`? – Markiian Benovskyi Aug 10 '22 at 09:08
  • @MarkiianBenovskyi , do you mean `spring.cloud.stream.default.group` ? yes I have tried – Danduk82 Aug 10 '22 at 09:13
  • The error message you see about singular `binding` is a type that we must fix It should be plural `spring.cloud.stream.bindings.binding-name.group=hello` – Oleg Zhurakousky Aug 10 '22 at 09:56
  • @OlegZhurakousky thanks for confirming this, I though about that, but wanted to give it a try anyway – Danduk82 Aug 10 '22 at 10:03
  • @OlegZhurakousky , I have accepted the solution from @Tim, but nontheless I think that the documentation is misleading if it states that you can use `spring.cloud.stream.bindings..group` and instead you must use `applicationId` – Danduk82 Aug 18 '22 at 11:27

1 Answers1

4

Bit of a disclaimer, I'm a bit rusty on Spring but since I've been working with Kafka for the past couple of months I wanted to play with this too. I got it to work by doing two things:

  • use applicationId instead of group within the application properties

    spring:
      kafka:
        bootstrap-servers: localhost:29092
        consumer:
          auto-offset-reset: earliest
      cloud:
        stream:
          kafka:
            binder:
              functions:
                process:
                  applicationId: MyGroupIdUsingApplicationId
          bindings:
            process-in-0:
              bindings:
                process-in-0:
                  destination: my-custom-topic
    
    
  • explicitly declare a KafkaBinderConfigurationProperties bean

I created a working sample here for you to clone and test with if you need to: https://github.com/T-TK-Wan/SO-Spring_Cloud_Streams_Kafka_GroupId

Edit:

Just to add that I was focused on just seeing that the GroupId can be set and that it registers correctly, whether using the applicationId property is correct and what side effects there are, I haven't looked into it.

Tim
  • 561
  • 2
  • 13
  • Also found this relevant post explaining why it's applicationId used: https://stackoverflow.com/questions/66394271/unable-to-set-groupid-in-spring-cloud-stream-binder-kafka-streams3-1-1?rq=1 – Tim Aug 15 '22 at 01:19
  • hello @Tim, thanks for your answer. I don't know, to me this is a sort of workaround. What if your application needs to listen on different topics, and that for each one of them the group-id must be different? For example, our client requires that the group-id naming convention also takes track of the topic that it consumes (don't ask me what my opinion about this convention is) – Danduk82 Aug 16 '22 at 08:26
  • @Danduk82 each consumer has a group-id. So if you have multiple functions, then you should be able to set diff group-ids I believe. Let me have a play around. – Tim Aug 16 '22 at 09:07
  • This is also my understanding, and my problem is that the group id MUST be set explicitly in this project. For the moment I have rolled-back to use the spring-boot kafka lib instead of streams, which is a shame when you see how much more you can do with streams. – Danduk82 Aug 16 '22 at 09:11
  • I've pushed a change if you want to check it out. I'm not sure if i've misunderstood, but there's nothing stopping you from setting the groupId (applicationId field) to your specific value within the properties file. ```applicationId: my-custom-topic-GroupId``` With logs showing as such: ```[Consumer clientId=my-custom-topic-GroupId-5a17f930-560c-40bb-86b6-803657a86499-StreamThread-1-consumer, groupId=my-custom-topic-GroupId] ``` – Tim Aug 16 '22 at 09:20
  • Indeed, but setting the value on the binder, will set this as "global" for the application. But if you want to set it on the channel (as I would have expected to be possible), to be able to set a different value for each channel (e.g in the case that you have more than one topic to listen) you loose this granularity. Or am I missing something? – Danduk82 Aug 16 '22 at 13:27
  • @Danduk82, you can configure the group-id based on each function which allows for granularity on the topics that are consumed. The terminology regarding channels and binders is not clear to me as I've not worked on the Cloud Streams lib except for just experimenting for this post. I've pushed another update to the repo, where you can see there's three different functions consuming different topics (one with multiple). Each has a separate group-id. Apologies if I haven't understood correctly. – Tim Aug 16 '22 at 15:58
  • Hello @Tim, i have tested your solution with the different group ids: it works like a charm, thanks a lot. I validate your answer. – Danduk82 Aug 18 '22 at 11:22
  • (and the bounty) – Danduk82 Aug 18 '22 at 11:23