0

I want to store big files in Kafka, using metadata about the record to retrieve them in the future.

So I send around messages containing the topic, partition_id, offset and then I try to retrieve the file in this way:

def retrieve_file_from_kafka(topic_name, partition_id, offset):
    client = KafkaClient(hosts=BROKER_ADDRESS, broker_version="0.10.1.0")
    topic = client.topics[bytes(topic_name, "UTF-8")]

    consumer = topic.get_balanced_consumer(
    consumer_group=bytes("file_retrieve" + topic_name + str(partition_id) + str(offset), "UTF-8"))
    consumer.reset_offsets([(topic.partitions[partition_id], offset)])
    return consumer.consume()

It doesn't work though and just prints:

Offset reset for partition 0 to timestamp 8 failed. Setting partition 0's internal counter to 8

This error is quite cryptical and it happens on the reset_offsets. When I try to consume, the process is then stuck waiting for the rebalancing_lock. What am I doing wrong?

Chobeat
  • 3,445
  • 6
  • 41
  • 59
  • How large a file are you trying to store? Kafka's default maximum message size is 1MB... Kafka was not designed for file transfer. Or are you just storing metadata? – OneCricketeer May 16 '18 at 13:50
  • As pointed out in the message `to timestamp 8` is not an offset, it seems to think you're seeking to a epoch time – OneCricketeer May 16 '18 at 13:52
  • @cricket_007 message size can be modified and many companies store blobs in Kafka very easily. I'm using it as a message bus and as a hot storage for a system that has no huge performance requirements. – Chobeat May 16 '18 at 14:29
  • @cricket_007 this is what pykafka does by default and there doesn't seem to be another option to do this. The error comes directly from inside kafka, it's not coming from pykafka itself. What you say can be a hint though. – Chobeat May 16 '18 at 14:31

0 Answers0