0

High Level Issue

I am running kafka locally and I am using a compacted topic. When I run the command line producer and consumer I can verify that compaction is occurring but when I use sarama ("github.com/Shopify/sarama") producer, log compaction does not seem to occur.

Verifying Log Compaction

First I created a topic using the following command:

bin/kafka-topics.sh --zookeeper localhost:2181 \
  --create --topic andrew.topic \
  --config "cleanup.policy=compact" \
  --config "delete.retention.ms=100" \
  --config "segment.ms=100" \
  --config "min.cleanable.dirty.ratio=0.01" \
  --partitions 1 \
  --replication-factor 1

Next I produce several message to it using the following:

  for i in $(seq 0 10); do \
  echo "sameKey123:differentMessage$i" | bin/kafka-console-producer.sh \
  --broker-list localhost:9091 \
  --topic andrew.topic \
  --property "parse.key=true" \
  --property "key.separator=:"; \
done

Finally verify that log compaction occurred:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9091 \
  --topic andrew.topic \
  --property print.key=true \
  --property key.separator=" : " \
  --from-beginning

Which prints:

sameKey123 : differentMessage9
sameKey123 : differentMessage10

So log compaction for the andrew.topic topic is occurring.

Now Using Sarama

Now I use sarama to produce messages to the same topic as follows:

package main

import (
    "fmt"
    "github.com/Shopify/sarama"
    "os"
    "os/signal"
)

func main() {
    sendMessages()

}

func sendMessages() {
    producer, err := sarama.NewSyncProducer([]string{"localhost:9091"}, nil)
    if err != nil {
        panic(err)
    }
    defer func() {
        if err := producer.Close(); err != nil {
            panic(err)
        }
    }()

    for i := 0; i <= 10; i++ {
        pm := &sarama.ProducerMessage{
            Topic: "andrew.topic",
            Key:   sarama.StringEncoder("sameSaramaKey123"),
            Value: sarama.StringEncoder(fmt.Sprintf("differentMessage%v", i)),
        }
        _, _, err := producer.SendMessage(pm)
        if err != nil {
            panic(err)
        }
    }
}

After restarting the consumer at the command line I see the following output

sameKey123 : differentMessage9
sameKey123 : differentMessage10
sameSaramaKey123 : differentMessage0
sameSaramaKey123 : differentMessage1
sameSaramaKey123 : differentMessage2
sameSaramaKey123 : differentMessage3
sameSaramaKey123 : differentMessage4
sameSaramaKey123 : differentMessage5
sameSaramaKey123 : differentMessage6
sameSaramaKey123 : differentMessage7
sameSaramaKey123 : differentMessage8
sameSaramaKey123 : differentMessage9
sameSaramaKey123 : differentMessage10

Log compaction did not take place here. No matter how many times I restart the consumer or how many messages I produce with sarama log compaction does not seem to occur.

More Weirdness

If after producing messages with sarama I then produce more messages at the command line log compaction then occurs

I get the following output after running terminal producer after running sarama producer

sameSaramaKey123 : differentMessage10
sameKey123 : differentMessage9
sameKey123 : differentMessage10

After running the producer at terminal, log compaction occurs for all messages including those previously produced by sarama.

Why is this happening? How can I fix it?

Andrew Dawson
  • 145
  • 1
  • 2
  • 8
  • I had similar trouble when dealing with the retention period. The retention period excludes the active segment and only works on the old segments. I imagine it is the same with compaction. You can adjust `segment.ms` and `segment.byte` to a lower value to force segmentation to occur more frequently. Kafka creates a new segment when Kafka receive a new message AND either of these properties are exceeded. The behaviour you describe by producing more message is correct. – kkflf Aug 17 '18 at 07:17
  • In the example above I am setting segment.ms=100 when I create the topic. – Andrew Dawson Aug 20 '18 at 20:33
  • Yep. What you experience is normal behavior. The segmentation occurs when the Kafka topic receives a message. Kafka does not have a periodic scheduler to check for segmentation, it needs to be triggered. – kkflf Aug 20 '18 at 20:51

0 Answers0