Spring Cloud Kafka: Can't serialize data for output stream when two processors are active

Question

I have a working setup for Spring Cloud Kafka Streams with functional programming style. There are two use cases, which are configured via application.properties. Both of them work individually, but as soon as I activate both at the same time, I get a serialization error for the output stream of the second use case:

Exception in thread "ActivitiesAppId-05296224-5ea1-412a-aee4-1165870b5c75-StreamThread-1" org.apache.kafka.streams.errors.StreamsException:
Error encountered sending record to topic outputActivities for task 0_0 due to:
...
Caused by: org.apache.kafka.common.errors.SerializationException:
Can't serialize data [com.example.connector.model.Activity@497b37ff] for topic [outputActivities]
Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException:
Incompatible types: declared root type ([simple type, class com.example.connector.model.Material]) vs com.example.connector.model.Activity

The last line here is important, as the "declared root type" is from the Material class, but not the Activity class, which is probably the source error.

Again, when I only activate the second use case before starting the application, everything works fine. So I assume that the "Material" processor somehow interfers with the "Activities" processor (or its serializer), but I don't know when and where.

Setup

1.) use case: "Materials"

one input stream -> transformation -> one output stream

@Bean
public Function<KStream<String, MaterialRaw>, KStream<String, Material>> processMaterials() {...}

application.properties

spring.cloud.stream.kafka.streams.binder.functions.processMaterials.applicationId=MaterialsAppId
spring.cloud.stream.bindings.processMaterials-in-0.destination=inputMaterialsRaw
spring.cloud.stream.bindings.processMaterials-out-0.destination=outputMaterials

2.) use case: "Activities"

two input streams -> joining -> one output stream

@Bean
public BiFunction<KStream<String, ActivityRaw>, KStream<String, Assignee>, KStream<String, Activity>> processActivities() {...}

application.properties

spring.cloud.stream.kafka.streams.binder.functions.processActivities.applicationId=ActivitiesAppId
spring.cloud.stream.bindings.processActivities-in-0.destination=inputActivitiesRaw
spring.cloud.stream.bindings.processActivities-in-1.destination=inputAssignees
spring.cloud.stream.bindings.processActivities-out-0.destination=outputActivities

The two processors are also defined as stream function in application.properties: spring.cloud.stream.function.definition=processActivities;processMaterials

Thanks!

Update - Here's how I use the processors in the code:

Implementation

// Material model
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class MaterialRaw {
    private String id;
    private String name;
}

@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class Material {
    private String id;
    private String name;
}

// Material processor
@Bean
public Function<KStream<String, MaterialRaw>, KStream<String, Material>> processMaterials() {
    return materialsRawStream -> materialsRawStream .map((recordKey, materialRaw) -> {
        // some transformation
        final var newId = materialRaw.getId() + "---foo";
        final var newName = materialRaw.getName() + "---bar";
        final var material = new Material(newId, newName);

        // output
        return new KeyValue<>(recordKey, material); 
    };
}

// Activity model
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class ActivityRaw {
    private String id;
    private String name;
}

@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class Assignee {
    private String id;
    private String assignedAt;
}

/**
 * Combination of `ActivityRaw` and `Assignee`
 */
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class Activity {
    private String id;
    private Integer number;
    private String assignedAt;
}

// Activity processor
@Bean
public BiFunction<KStream<String, ActivityRaw>, KStream<String, Assignee>, KStream<String, Activity>> processActivities() {
    return (activitiesRawStream, assigneesStream) -> { 
        final var joinWindow = JoinWindows.of(Duration.ofDays(30));

        final var streamJoined = StreamJoined.with(
            Serdes.String(),
            new JsonSerde<>(ActivityRaw.class),
            new JsonSerde<>(Assignee.class)
        );

        final var joinedStream = activitiesRawStream.leftJoin(
            assigneesStream,
            new ActivityJoiner(),
            joinWindow,
            streamJoined
        );

        final var mappedStream = joinedStream.map((recordKey, activity) -> {
            return new KeyValue<>(recordKey, activity);
        });

        return mappedStream;
    };
}

could you please share the MaterialRaw and Material model so that issue can be reproduced locally? — Govinda Sakhare, Nov 30 '20 at 07:16
@GovindaSakhare Done, but I think this has something to do with the processors/Spring, rather than the model itself. — Bennett Dams, Nov 30 '20 at 10:33
@GovindaSakhare The "Done" was meant for updating the SO post with the code, not fixing to problem, unfortunately. — Bennett Dams, Nov 30 '20 at 11:14
@BennettDams Can you please post the code for ActivityJoiner? I think that is missing in the above code snippet. — sobychacko, Nov 30 '20 at 18:14

sobychacko · Accepted Answer · 2020-12-01T14:42:55.923

This turns out to be an issue with the way the binder infers Serde types when there are multiple functions with different outbound target types, one with Activity and another with Material in your case. We will have to address this in the binder. I created an issue here.

In the meantime, you can follow this workaround.

Create a custom Serde class as below.

public class ActivitySerde extends JsonSerde<Activity> {}

Then, explicitly use this Serde for the outbound of your processActivities function using configuration.

For e.g.,

spring.cloud.stream.kafka.streams.bindings.processActivities-out-0.producer.valueSerde=com.example.so65003575.ActivitySerde

Please change the package to the appropriate one if you are trying this workaround.

Here is another recommended approach. If you define a bean of type Serde with the target type, that takes precedence as the binder will do a match against the KStream type. Therefore, you can also do it without defining that extra class in the above workaround.

@Bean
public Serde<Activity> activitySerde() {
  return new JsonSerde(Activity.class);
}

Here are the docs where it explains all these details.

That was it! As I'm new to Kafka & Spring Cloud Stream: Could you tell me if I'm doing something special? I'm just confused that this is a new bug, as I thought that I followed the basic principles. Is there a better/more common way to do what I do? — Bennett Dams, Dec 01 '20 at 09:57
If you are strictly relying on the binder's inference capabilities for `Serde` types, this is a bug. However, since there are ways to resolve this, you might want to resort to those workarounds (I am also updating the answer with another workaround when failing `Serde` inference). We will try to come up with an implicit way to handle this though as you have run into it. — sobychacko, Dec 01 '20 at 14:38

score 0 · Answer 2 · answered Nov 25 '20 at 14:54

0

You need to specify which binder to use for each function s.c.s.bindings.xxx.binder=....

However, without that, I would have expected an error such as "multiple binders found but no default specified", which is what happens with message channel binders.

answered Nov 25 '20 at 14:54

Gary Russell

166,535
14
146
179

Could you be more specific? How to configure "which binder to use"? Right now the types for the input and output are de-/serialized via the inferred types of the `processActivities` and `processMaterials` function, so I'm not sure what and how to set via `s.c.s.bindings.xxx.binder=...`. – Bennett Dams Nov 25 '20 at 15:26
I was misled by your `>which each have their own binder`, and thought you had defined 2 binders per https://docs.spring.io/spring-cloud-stream/docs/3.0.10.RELEASE/reference/html/spring-cloud-stream.html#multiple-binders While that talks about different binder types, the same concept applies to binders of the same type. – Gary Russell Nov 25 '20 at 15:37
Sorry! The binder being the problem was just my uneducated guess. The real problem is the serialization of the output, which only works when only ONE processor is active. If you have another idea why this is the case, I will gladly take it. – Bennett Dams Nov 25 '20 at 15:55

Spring Cloud Kafka: Can't serialize data for output stream when two processors are active

2 Answers2

Linked