0

In a stream processing application using Spring Cloud Stream I am taking an input stream (keyed on an integer) and calling selectKey on it to create a new topic with the same values, but with a different key (a string). The input topic has records in it in proper JSON format, e.g.:

"key": {
  "id": 1
},
"value": {
  "id": 1,
  "public_id": "4273b60f-6fe6-40be-8602-d0b3ed2ecf2a", ...

The problem is that the topic created by the stream processing application has the value as a string containing JSON rather than as proper JSON, i.e.:

"key": "4273b60f-6fe6-40be-8602-d0b3ed2ecf2a",
"value": "{\"id\":1,\"publicId\":\"4273b60f-6fe6-40be-8602-d0b3ed2ecf2a\"}"

The code is as follows:

@StreamListener
@SendTo("output")
fun process(@Input("input") stream: KStream<Int, MyObj>): KStream<String, MyObj> =
         stream.selectKey { _, value -> value.publicId }

What the function above does is consume the input stream, and generate an output stream (being sent to output). That output stream has the same values as the input stream, but simply a different key. (In this case the key comes from the value's publicId property.)

The application.yml is as follows:

spring.cloud.stream:
  bindings:
    input:
      destination: input-topic
    output:
      destination: output-output
  kafka:
    streams:
      binder:
        application-id: test-app-id-1
      bindings:
        input:
          consumer:
            keySerde: org.apache.kafka.common.serialization.Serdes$IntegerSerde
        output:
          producer:
            keySerde: org.apache.kafka.common.serialization.Serdes$StringSerde

Is there something I'm missing? Is this actually a problem, or is it OK for the JSON to be stored as a string in the messages produced by Spring Cloud Stream?

Other things I've tried which haven't made a difference:

  • Using native decoding/encoding
  • Setting spring.cloud.stream.bindings.output.content-type to application/json
  • Using map instead of selectKey
Yoni Gibbs
  • 6,518
  • 2
  • 24
  • 37

1 Answers1

1

It implies you are sending publicId: "4273b60f-6fe6-40be-8602-d0b3ed2ecf2a" as a String instead of a POJO.

If that's what you are sending, you should use a StringSerde not a JsonSerde.

EDIT

I just tested it with a Java app and it works as expected...

@SpringBootApplication
@EnableBinding(KafkaStreamsProcessor.class)
public class So58538297Application {

    public static void main(String[] args) {
        SpringApplication.run(So58538297Application.class, args);
    }

    @StreamListener(Processor.INPUT)
    @SendTo(Processor.OUTPUT)
    public KStream<String, Foo> process(@Input(Processor.INPUT) KStream<String, Foo> stream) {
        return stream.selectKey((key, value) -> value.getBar());
    }

    @Bean
    public ApplicationRunner runner(KafkaTemplate<String, String> template) {
        ObjectMapper mapper = new ObjectMapper();
        return args -> {
            template.send(Processor.INPUT, mapper.writeValueAsString(new Foo("baz")));
        };
    }

    @KafkaListener(id = "outputGroup", topics = Processor.OUTPUT)
    public void out(String in, @Header(KafkaHeaders.RECEIVED_MESSAGE_KEY) String key) {
        System.out.println("out:" + in + ", key:" + key);
    }

    @KafkaListener(id = "copyOfInput", topics = Processor.INPUT)
    public void in(String in) {
        System.out.println("in:" + in);
    }

    public static class Foo {

        private String bar;

        public Foo() {
            super();
        }

        public Foo(String bar) {
            this.bar = bar;
        }

        public String getBar() {
            return this.bar;
        }

        public void setBar(String bar) {
            this.bar = bar;
        }

    }

}

and

spring.application.name=so58538297

spring.kafka.consumer.auto-offset-reset=earliest

and

in:{"bar":"baz"}
out:{"bar":"baz"}, key:baz
Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • The deserialization is working fine. It's the serialization that's the issue. I'm looking at the data that's serialized into the Kafka topic (using Kafka's console center or Landoop's Kafka Topics UI) and the object is serialized as a string. The string it's serialized as is valid JSON, but it's a string with everything escaped in it, not raw JSON like the source topic (with data produced, in this example, by Debezium). Also in the code you linked it's showing `stream.through` with two serde parameters: from what I can see `through` needs to take in a topic name. – Yoni Gibbs Oct 24 '19 at 13:19
  • Sorry - missed that; the StringSerde is for the key; what you observe will happen if you are publishing a String instead of a POJO. – Gary Russell Oct 24 '19 at 13:24
  • So how do I get it to publish as JSON representation of the POJO instead of a string containing that JSON? I've tried using native encoding/decoding too and had no luck there either. – Yoni Gibbs Oct 24 '19 at 13:27
  • Use a `StringSerde` instead of a `JsonSerde`. – Gary Russell Oct 24 '19 at 13:29
  • But it is JSON i want. It's not just that one field (publicId): it's the whole POJO I want to persist as JSON. To clarify: the actual object in the message I don't want to change: it's just the key. – Yoni Gibbs Oct 24 '19 at 13:29
  • Your question is confusing; you are showing the consumer of the stream which maps it to `value -> value.publicId` - your problem is with whatever is producing the record; you need to show that. – Gary Russell Oct 24 '19 at 13:34
  • I have shown that: the `process` function is consuming and producing. See the `@SendTo("output")` annotation. Also it's not mapping the value to to the publicId: it's using `selectKey`, so it's setting the **key** to the `publicId`. The value remains unchanged. I've updated to my question to try to clarify this: apologies if it was confusing. – Yoni Gibbs Oct 24 '19 at 13:35
  • Sorry, now I see the javadoc for `selectKey` I see what your processor is supposed to do. You are just sending the same value, but with a new key. So, somehow "double" JSON conversion is happening. I suggest you run in a debugger to see why. – Gary Russell Oct 24 '19 at 13:38
  • Hmmm - I just tested it with a Java app and (Boot 2.2, Hoxton.M3) and it worked as expected (see my edit). What versions are you using? – Gary Russell Oct 24 '19 at 14:48
  • Thanks. I've been doing some more investigation, and actually it looks like the data being shown as a string (rather than JSON) is only happening in the Landoop Kafka Topics UI. I upgraded to the latest versions and it hasn't changed anything. But I'm beginning to suspect this is more something to do with the Landoop UI. The Kafka Console Consumer shows the data fine, and writing other stream processors to take the data created by the original stream processor seems to work fine too. So looks like it is all working. Thanks for the help. – Yoni Gibbs Oct 24 '19 at 15:33