We have implemented a solution using Sqoop to load data from RDBMS to our hadoop cluster, for append-only data, it goes to hive while dimension data to hbase.
Now we are setting up two identical Hadoop clusters, they are the backup cluster for each other. We want to load the data from RDBMS once to both clusters. Sqoop doesn't allow us to do it. We have seen some streaming solutions like streamsets or nifi which allows to pull data from one place and send it to multiple destinations in one go. Also, we are considering to use sqoop to load data to one cluster, then set up a sync up job to copy the data to another cluster periodically, this just sounds more appropriate considering the volume of the data we have is huge.
Can someone share some real life experiences on this?