3

Using a handler passed to SetPartitionsAssignedHandler I need to be able to consume from the message immediately following the last message consumed by the group.

I have a consumer on a single partition topic. Because I need to be able to set a custom offset in certain circumstances, I have implemented a handler and passed it to SetPartitionsAssignedHandler. If the handler determines that a specific offset is needed, it figures out the offset and returns a TopicPartitionOffset with that value set in an Offset instance. This works as expected. What does not work is if no specific offset is expected. I've tried returning * a TopicPartitionOffset with Offset.End - consumes from the next message posted to the topic * a TopicPartitionOffset with Offset.Beginning - consumes from the beginning of the partition * a TopicPartitionOffset with either Offset.Stored or Offset.Unset - consumes from the last message consumed, but always consumes that message again. I've checked that that's what's happening by looking at the offset of the first message consumed. * nothing - the consumer never consumes any messages

I've searched around, including going through the code, but the TopicPartitionOffset information is passed into the librdkafka.dll and that determines what is done with the offset information, so I can't see why Stored and Unset both reconsume the last consumed message. I also can't see why not returning a TopicPartitionOffset results in the consumer never consuming anything.

The group ID is consistent, so it's not a case of the group ID changing that's causing a problem. And if the group ID was changing then using Offset.Stored or Offset.Unset would result in the partition being read from the beginning anyway.

I've found this same question asked with no answer at consumer consuming the same message twice at the starting only. I've also looked at How to make kafka consumer to read from last consumed offset but not from beginning, but setting the offset reset to earliest and the group ID does not result in the desired behaviour because giving a handler to SetPartitionAssignedHandler evidently overrides whatever the default behaviour would be. I didn't find any other questions that seemed relevant and so far no other relevant information has come up anywhere. I did also go through the existing issues listed on the Github repo before looking through the code to see if I could spot anything there.

The ConsumerConfig is constructed like this.

private ConsumerConfig GetConsumerConfig(Config.Consumer config)
{
    ConsumerConfig consumerCfg = GetBaseConfiguruation();

    if (!consumerCfg.Any())
    {
        return consumerCfg;
    }

    consumerCfg.BootstrapServers = "localhost:9092";
    consumerCfg.GroupId = "TestConsumer";
    consumerCfg.EnableAutoCommit = false;
    consumerCfg.AutoOffsetReset = AutoOffsetReset.Earliest;

    return consumerCfg;
}

private ConsumerConfig GetBaseConfiguruation()
{
    Option<string> ipAddr = LocalIpAddress.GetLocalIpAddress();

    return ipAddr.HasValue
        ? new ConsumerConfig()
        {
            ClientId = $"{ipAddr.ValueOrFailure()}",
            AutoCommitIntervalMs = 1000,
            SessionTimeoutMs = 30000,
            StatisticsIntervalMs = 60000,
            FetchMinBytes = 64 * 1024,
            FetchWaitMaxMs = 200000,
            MaxPartitionFetchBytes = 3 * 102400
        }
        : new ConsumerConfig();
}

The Group ID is in the app.config, so it's always the same unless the config is changed. I'm not changing it between executions of the app.

The consumer is constructed with this configuration.

private IConsumer<string, string> CreateConsumer(ConsumerConfig config)
{
    return new ConsumerBuilder<string, string>(config)
        .SetErrorHandler(OnConsumeError)
        .SetStatisticsHandler(OnStatistics)
        .SetLogHandler(OnLog)
        .SetOffsetsCommittedHandler(OnOffsetCommit)
        .SetPartitionsAssignedHandler(OnPartitionsAssigned)
        .Build();
}

When the consumer connects it subscribes one or more configured topics, usually just one, by calling

IConsumer<T,U>.Subscribe(List<string>)

The exact configuration in the app.config isn't relevant here - it consists of the topic name, an optional offset and information on where the message goes for processing.

This is simplified code representing what construction of the TopicPartitionOffset when no specific offset is needed (and with AutoOffsetReset.Earliest hardcoded to force use of Offset.Stored).

private List<TopicPartitionOffset> OnPartitionsAssigned(
    IConsumer<string, string> consumer,
    List<TopicPartition> topicPartitions)
{
    List<TopicPartitionOffset> offsetPartitions = 
        topicPartitions.Select(partition => GetPartitionOffset(AutoOffsetReset.Earliest, partition))
                       .ToList();
    return offsetPartitions;
}

private static TopicPartitionOffset GetPartitionOffset(AutoOffsetReset offsetReset, TopicPartition partition)
{
    return new TopicPartitionOffset(
        partition.Topic,
        partition.Partition,
        AutoOffsetReset.Latest == offsetReset ? Offset.End : Offset.Stored);
}

The only significant difference when a specific offset is needed is that instead of Offset.End or Offset.Stored a numeric value is determined and used.

I expected that using Offset.Stored would result in consuming from the message after the last message consumed on the partition (by the group). But it always results in reconsuming the last message that was consumed.


Update: After further investigation I tried getting the committed offsets for the partitions in the handler passed to SetPartitionAssignedHandler. Then, instead of using one of the Offset special values, I assign construct a TopicPartitionOffset with an Offset having a Value of 1 higher than that last committed. This works well, except for cases where multiple messages were received in a batch. It appears that Commit(IEnumerable) has a bug which results in only the first offset for a partition being committed. So if we receive 3 messages on a given partition, only the lowest offset for that is committed if I include multiple for the same partition.

So pseudo code that produces the desired result (when not using auto-commit):

  • Set partition offset to one greater than the last committed.
  • Read messages.
  • After processing a batch of messages, get the highest offset for each partition and commit that list.
Craig
  • 119
  • 7

0 Answers0