1

I've run into a problem where I need to repartition an existing topic (source) to a new topic (target) with a higher number of partitions (a multiple of the number of previous partitions).

The source topic was written to using Spring Cloud Stream using the Kafka Binder. The target topic is being written to using a KStreams application.

The records in the source topic were being partitioned based on a header, with key=null. I tried to explicitly extract this header and set a message key for records in the target topic, and noticed that records with the same partition key were landing in completely different partitions.

After some investigation, I've found the culprit to be the following:

org.springframework.cloud.stream.binder.PartitionHandler.DefaultPartitionSelector

    private static class DefaultPartitionSelector implements PartitionSelectorStrategy {

        @Override
        public int selectPartition(Object key, int partitionCount) {
            int hashCode = key.hashCode();
            if (hashCode == Integer.MIN_VALUE) {
                hashCode = 0;
            }
            return Math.abs(hashCode);
        }
    }

org.springframework.cloud.stream.binder.PartitionHandler

    public int determinePartition(Message<?> message) {
        // ... non relevant code omitted

        partition = this.partitionSelectorStrategy.selectPartition(key,
                    this.partitionCount);
        // protection in case a user selector returns a negative.
        return Math.abs(partition % this.partitionCount);

While the default Kafka partitioning strategy does:

org.apache.kafka.clients.producer.internals.DefaultPartitioner

    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster,
                         int numPartitions) {
        if (keyBytes == null) {
            return stickyPartitionCache.partition(topic, cluster);
        }
        // hash the keyBytes to choose a partition
        return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
    }

In essence, using something other than Spring Cloud Stream would never allow co-partitioning with a topic written to by a non Spring Cloud Stream App, unless a custom Partitioner is used (not too difficult to do).

It should be noted, however, that the above DefaultPartitionSelector is not located in the Kafka Binder module, but in the higher-level spring-cloud-stream module.

What is the reasoning for this design choice? I imagine the default partitioner applies to all binders, not just Kafka, but why does the Kafka Binder not implement and its own Partitioner that allows out-of-the-box co-partitioning with non-Spring Cloud Stream apps by default?

filpa
  • 3,651
  • 8
  • 52
  • 91
  • Partitioning at the binder level is intended for infrastructure that doesn't support partitioning natively; just don't use it and let Kafka do the partitioning itself. – Gary Russell Aug 02 '22 at 13:48
  • @GaryRussell Thanks for the reply. I reread the [docs](https://cloud.spring.io/spring-cloud-stream-binder-kafka/spring-cloud-stream-binder-kafka.html#_partitioning_with_the_kafka_binder) and the info blurb in that section explicitly mentions this case. However, I suppose the previous developers working on my codebase weren't aware that the default Kafka producer partition strategy is different to this - I would assume others would be similarly confused (since it *is* similar). Perhaps this could be more explicit in the documentation? I'm more interested in the reasoning - I assume simplicity. – filpa Aug 02 '22 at 14:28

1 Answers1

1

As I said in my comment

Partitioning at the binder level is intended for infrastructure that doesn't support partitioning natively; just don't use it and let Kafka do the partitioning itself.

That said, it's not entirely clear what you mean; the spring partitioner was written long ago and predates the sticky cache introduced by KIP 480. But, even that partitioner will change the partition if the number of partitions changes when the app is restarted - if there is a key, it is modded by the number of partitions; if there is no key, a random (sticky) partition is selected.

Run this with 10, then 20, partitions and you will see that.

@SpringBootApplication
public class So73207602Application {

    public static void main(String[] args) {
        SpringApplication.run(So73207602Application.class, args).close();
    }

    @Bean
    ApplicationRunner runner(KafkaTemplate<String, String> template, NewTopic topic, KafkaAdmin admin) {
        return args -> {
            System.out.println(template.send("topic1", "foo", "bar").get().getRecordMetadata());
        };
    }

    @Bean
    public NewTopic topic() {
        return TopicBuilder.name("topic1").partitions(10).replicas(1).build();
    }

}

With a null key you will get a different (random) partition each time.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • Thanks for the answer. I'll try it out, but perhaps I could reword in the meantime: an SCS app had a `partitionKeyExpression` set to `headers.someId` and no `key`. I 'piped' the records in this topic (via Kstreams using default partitioner) to a new topic, whereby I set the record `key` explicitly (to the same value that was used in the previous topic - as a String). As a test I set number of partitions in both topics to be the same. In this case, I expected records to retain their partition assignment - they did not. Is this expected behaviour? – filpa Aug 02 '22 at 15:09
  • I don't know why you would expect the same results; the spring version uses the `hashCode()` method of the selected key object (which could be anything), the Kafka code uses a `murmur2` hash of the bytes in the key `byte[]`, after serialization. As I said, partitioning at the binder level was not intended to be used with Kafka because it has native partitioning. Feel free to open an issue in GitHub against `spring-cloud-stream` (with suggestions) if you feel the documentation needs improvement. – Gary Russell Aug 02 '22 at 15:19
  • Well, that's an implementation detail. The `key` hashing part of this question also is pretty tangential to KIP-480 AFAICS. The assumption was that a Kafka binder would want to do this as close to *Kafka-like* as possible - including partitioning. Having looked at the actual code, you're obviously correct and my assumptions didn't hold. But there's no real reason they *couldn't* - the binder could do it the same way, technically. – filpa Aug 02 '22 at 15:33
  • Either way, this mostly answers the question so I'll be accepting the answer and we can gladly leave it at that. I'll hop on over to the `spring-cloud-stream` tracker to hopefully make this clearer for others. – filpa Aug 02 '22 at 15:33
  • 1
    Not wishing to flog a dead horse, but at the binder level, the key extractor could return something like a `Customer` object; hence the need to use the standard `hashCode()`. Partitioning at that level is agnostic - it doesn't even know what type of binder is actually in use at that point the stack. – Gary Russell Aug 02 '22 at 15:48