2

Using PHP Laravel Framework to consume kafka messages with the help of the mateusjunges/laravel-kafka laravel package.

Is it possible to save the offset by consumer in, for example, Redis or DB? And, when the broker shuts down and comes back up, is it possible to tell the consumer to start consuming messages from that specific offset?

Let's say I have a laravel Artisan command that builds the following consumer :

public function handle()
{
    $topics = [
        'fake-topic-1',
        'fake-topic-2',
        'fake-topic-3'
    ];

    $cachedRegistry = new CachedRegistry(
        new BlockingRegistry(
            new PromisingRegistry(
                new Client(['base_uri' => 'https://fake-schema-registry.com'])
            )
        ),
        new AvroObjectCacheAdapter()
    );        

    $registry = new \Junges\Kafka\Message\Registry\AvroSchemaRegistry($cachedRegistry);
    $recordSerializer = new RecordSerializer($cachedRegistry);

    foreach ($topics as $topic) 
    {
        $registry->addKeySchemaMappingForTopic(
            $topic,
            new \Junges\Kafka\Message\KafkaAvroSchema($topic . '-key')
        );
        $registry->addBodySchemaMappingForTopic(
            $topic,
            new \Junges\Kafka\Message\KafkaAvroSchema($topic . '-value')
        );
    }

    $deserializer = new \Junges\Kafka\Message\Deserializers\AvroDeserializer($registry, $recordSerializer);

    $consumer = \Junges\Kafka\Facades\Kafka::createConsumer(
        $topics, 'fake-test-group', 'fake-broker.com:9999')
    ->withOptions([
        'security.protocol' => 'SSL',
        'ssl.ca.location' => storage_path() . '/client.keystore.crt',
        'ssl.keystore.location' => storage_path() . '/client.keystore.p12',
        'ssl.keystore.password' => 'fakePassword',
        'ssl.key.password' => 'fakePassword',
    ])
    ->withAutoCommit()
    ->usingDeserializer($deserializer)
    ->withHandler(function(\Junges\Kafka\Contracts\KafkaConsumerMessage $message) {

        KafkaMessagesJob::dispatch($message)->onQueue('kafka_messages_queue');

    }) 
    ->build();    
    
    $consumer->consume();
}

My problem now is that, from time to time, the "fake-broker.com:9999" shuts down and when it comes up again, it misses a few messages...

  • offset_reset is set to latest ;
  • The option auto.commit.interval.ms is not set on the ->withOptions() method, so it is using the default value (5 seconds, I believe) ;
  • auto_commit is set to true and the consumer is built with the option ->withAutoCommit() as well ;

Let me know if you guys need any additional information ;) Thank you in advance.

EDIT: According to this thread here , I should set my "offset_reset" to "earliest", and not "latest". Even tho, I'm almost 100% sure that an offset is committed (somehow, somewhere stored), because I am using the same consumer group ID in the same partition (0), so, the "offset_reset" is not even taken into consideration, I'm assuming...

1 Answers1

1

somehow, somewhere stored

Kafka consumer groups store offsets in Kafka (__consumer_offsets topic). So, therefore, storing externally doesn't really make sense because you need Kafka to be up, regardless.

Is it possible to save the offset by consumer in, for example, Redis or DB? And, when the broker shuts down and comes back up, is it possible to tell the consumer to start consuming messages from that specific offset?

In general, it is, but it adds unnecessary complexity. You'd need to manually assign each partition to your client rather than subscribing the consumer to just a topic. It's not clear to me if that Kafka library supports custom partition assignment, though

It's not clear from your question why Kafka would be scaled to zero brokers and have less uptime than "Redis or DB" for you not to store offsets in Kafka. (Redis is a DB, so not sure why that's an "or"...)


Only when there is no consumer group does that offset_reset value matter. The consumer client isn't (shouldn't? I don't know the PHP client code.) "caching" the offsets locally, and broker restarts should preserve any committed values. If you want to guarantee you are able to commit every message, you need to disable auto-commits and handle it yourself. https://junges.dev/documentation/laravel-kafka/v1.8/advanced-usage/4-custom-committers

You can optionally inspect the message in your handler function, and store that message offset somewhere else, but then you are fully responsible for seeking the consumer when it starts back up (again, you want to disable all commit functionality in the consumer, and also set auto.offset.reset consumer config to none rather that latest/earliest). This config will throw an error when the offset doesn't exist, however

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245