0

I am working on a setup similar to the one in the attached picture. I have 2 Cloudera Impala databases, on either side of a DMZ zone.

Every month, around 20000 records are written into the non-DMZ Impala and this incremental load needs to be ported to the DMZ-Impala. This needs to take place at a particular day every month at a set time. It is important that this data transfer takes as little time as possible (a few seconds or lesser).

How long will it take to push the data between the two Impala DBs? If it will take too long, what would be a better alternative?

two impala databases on either side of a DMZ firewall

Sharanya
  • 45
  • 1
  • 8
  • 1
    I dont know how to answer this. Moving 20000 rows a day is like pushing peanuts by elephant(hadoop reference :D). Anyways, i think it can depends on network, firewall, distance etc etc. Test the ping, lag etc. to figure out how much time it can take. But as i said, 20000 is very low number to estimate. This is big data, if impala takes 10min to process 20000 rows, i guess you can check impala configurations. – Koushik Roy Apr 28 '20 at 07:01
  • Besides, you'll be pushing a file so Impala is kind of irrelevant here. More like a job for `distcp` which should complete in a matter of seconds for 20K rows. – mazaneicha Apr 28 '20 at 19:16

0 Answers0