1

We are currently operating on Apache Kafka 0.10.1.1. We are migrating to Confluent Platform 5.X. The New cluster is setup completely on different set of physical nodes.

While we are already working on upgrading the API(s), our application uses spring-boot, we are trying to figure out how do we migrate the messages? I need to maintain the same ordering of messages in the Target Cluster.

  1. Can I simply copy the messages?
  2. Do I need to republish the messages to Target cluster for successful retention?
  3. What else can be done?
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Divs
  • 1,578
  • 2
  • 24
  • 51

2 Answers2

3

Confluent includes a tool called Replicator, that while being an enterprise feature, you can use for 30 day trial to perform data migrations.

But essentially, yes, the only thing you can do is consume from one cluster, and produce into another. You might get duplicated data at the destination under less than optimal network conditions, but that's just a tradeoff of using the platform.

FWIW, I would suggest adding the Confluent 3.x matching components to the existing cluster first, if possible. Or even just do rolling upgrade of the brokers alone, first. My point being, there's nothing to "migrate to Confluent" as Kafka isn't what's changing, you'd only be adding other processes around it, like the Schema Registry or Control Center

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thanks. Do you think copying `logs.dir` is not a good idea? `message` version can create a problem later if we need to re-consume them again? – Divs Dec 08 '18 at 05:56
  • I wouldn't trust copying the logs directory, no. Not unless the new broker has the same log version, but even then, Zookeeper wouldn't know you've copied that data – OneCricketeer Dec 08 '18 at 22:18
  • we manage offsets in Kafka instead of zookeeper, and we will operate on new set of zookeeper ensemble. Other than `offset`, what is the other hard dependency on zookeeper? – Divs Dec 13 '18 at 03:17
  • No, no. Not `__consumer_offsets`. I literally mean backing up `log.dirs` of Kafka, then restoring to a completely new broker **will not make** this topic available for consumption because Zookeeper will not know of this new broker ID or the partitions that it may contain – OneCricketeer Dec 13 '18 at 03:19
  • Basically, you need to migrate/translate the Zookeeper state as well, because it contains the broker mappings and partition information and topic metadata. Offsets aren't the issue. And if you were to restore this, the broker ID's can change on the new cluster (and it may not be 1:1, because you could add new brokers or remove others). Therefore, it requires manually Znode editing within the Zookeeper data. – OneCricketeer Dec 13 '18 at 03:24
  • Thanks! I will try and let you know soon. – Divs Dec 20 '18 at 11:42
2

Assuming the topic definition in the new cluster is exactly the same (i.e: nbr of partitions, retention, etc..) and the Producer hashing function on the message key will deliver your message to the same partition (will be a bummer if you have null keys because it'll end up in a random partition), you can simply consume from earliest from your old kafka broker topic and produce to your new topic in the new cluster using a custom consumer/producer or some tool like logstash.

If you want to be extra sure to get the same ordering, you should only use only one consumer per topic and if your consumer supports single threaded run, even better (might avoid racing conditions).

You might also try more common solutions like MirrorMaker but be advised that MirrorMaker ordering guarantees amount to:

The MirrorMaker process will, however, retain and use the message key for partitioning so order is preserved on a per-key basis.

Note: As stated in the first solution and as cricket_007 said, it will only work if you were using the default partitioner and wish to keep using it in the new cluster.

In the end, if everything goes OK, you can manually copy your consumer offsets from the old kafka broker and define them on your new cluster consumer groups.

Disclaimer: This is purely theoritical. I've never tried a migration with this sort of hard requirements.

Alexandre Juma
  • 3,128
  • 1
  • 20
  • 46
  • 2
    MirrorMaker does not preserve partition ordering unless the default partitioner is used throughout everything – OneCricketeer Dec 07 '18 at 13:42
  • Yes, I mentioned the ordering guarantees for mirror maker. Will make clearer the difference between the first proposed solution. – Alexandre Juma Dec 07 '18 at 13:46
  • Thanks @AlexandreJuma. We have ensured Once Consumer per consumer group orber a single partition. However, what issue do you perceive if the `logs.dir` files are simply copied to target Cluster? i.e. w/o using `mirrormaker` or any custom consume/produce mechanism. – Divs Dec 08 '18 at 06:00
  • That's a huge version bump (0.10.x to 2.0.x). Don't think copying the logs is a solution. If you don't want to consume/produce to migrate your data, you might consider upgrading your current cluster to the new kafka cluster and add your new broker nodes to the cluster. Then you can decommision the old nodes (I haven't heard of anyone doing this with a bare apache kafka distribution and a confluent platform, but should work) – Alexandre Juma Dec 10 '18 at 10:05