How to take Kafka Topic Backup and restore?

Question

I need to take backup all the topics in Kafka to the file named in respective topic names and need to restore the topic as per user requirement. Note: This script needs to be run in the Kerberized environment.
kafkabackup.sh

monyear=`date | awk '{print $2$6}'`
dat=`date| awk '{print $2$3$6}'`
export BACKUPDIR=/root/backup/$monyear
mkdir -p $BACKUPDIR
mkdir -p $BACKUPDIR/$dat
cd $BACKUPSDIR
BKDIR=$BACKUPDIR/$dat
##Log into Kafka

##Get topics from Kafka Broker

kinit -kt /etc/security/keytabs/kafka.service.keytab kafka/node1.localdomaino@domain.co
cd /usr/hdp/current/kafka-broker/bin/
export KAFKA_CLIENT_KERBEROS_PARAMS="-Djava.security.auth.login.config=/etc/kafka/conf/kafka_client_jaas.conf"
./kafka-topics.sh --zookeeper adminnode.localdomain:2181 --list > $BKDIR/listtopics.txt

##Remove if any mark of deletion topics exists
sed -i.bak '/deletion/d' $BKDIR/listtopics.txt

## Starting kill script in parallel 

bash checkandkill.sh& 

##Reading the file contents for topics
for line in $(cat $BKDIR/listtopics.txt)
do
    echo $line
    ./test.sh --bootstrap-server node1.localdomain:6668 --topic $line  --consumer.config /home/kafka/conf.properties --from-beginning --security-protocol SASL_SSL > $BKDIR/$line
done

##Delete empty files

/usr/bin/find . -size 0 -delete

## Killing checkandkill daemon

ps -ef |grep -i checkandkill.sh| grep -v grep | awk '{print $2}' | xargs kill

exit

When consumer runs, it constantly waits for messages to receive. We need to kill the process.
checkandkill.sh

sleep 0.5m
for line in $(cat /root/backup/listtopics.txt)
do
    echo $line
    sleep 1m
    ps -ef |grep -i $line| grep -v grep | awk '{print $2}' | xargs kill
done

Need your help to complete restoration script.

Could you use a standard consumer and producer for this? If so, have you considered MirrorMaker or Confluent Replicator to just send the data to a standby Kafka cluster? This would save you tons of work on restore. — dawsaw, Aug 04 '18 at 00:46
Another consideration is disk snapshots. If you can take down the cluster (or a standby mirror cluster) you could just use disk snapshots at regular intervals. The big thing is you have to stop Kafka for this to actually work though. — dawsaw, Aug 04 '18 at 00:47
You also need to backup Zookeeper for recovery even to be possible. In any case, if you dump raw binary data from the topics (for example, using Kafka Connect, Spark, NiFi, etc) into your HDP's HDFS, then it's just a matter of reading those bytes back into a topic in order to restore data — OneCricketeer, Aug 04 '18 at 18:40
Thanks for your comment. We don't have these much options here so that only we are handling this way as our requirement. It would be great if anyone suggests restoring in the same way. — Krishnaraj V, Aug 06 '18 at 08:54

How to take Kafka Topic Backup and restore?

0 Answers0