0

I want to write a tool that synchronize HBase tables between two environments. The tool should read data from the second cluster and update the table based on the timestamp.

I use hbase-client in version: 1.2.0-cdh5.12.1 and Spark version: 2.4.0-cdh6.1.1

I know copyTable (with timestamp parameters) Mapreduce solution but it seems to be slow.

Could anyone tell me if it's possible to speed up processing by using Spark framework?

mazaneicha
  • 8,794
  • 4
  • 33
  • 52
  • Have you considered a co-processor use ? Not sure, if it helps. But, Delta backups and restore scheduled to execute via a shell script could be a simple solution. – Kris Sep 09 '19 at 07:00
  • 1
    Whats wrong with native HBase replication? http://hbase.apache.org/book.html#_cluster_replication – mazaneicha Sep 09 '19 at 22:58

0 Answers0