1

I want to run incremental nightly job that extracts 100s of GBs of data from Oracle DataWarehouse into HDFS. After processing, the results (few GBs) needs to be exported back to Oracle.

We are running Hadoop in Amazon AWS, and our Data Warehouse is on premises. The data link between AWS and on premises is 100 mbps and not reliable.

If I use Sqoop-import to bring the data from Oracle, and the network experience intermittent outages, how does Sqoop handle this? Also, what happens if I imported (or exported) 70% of my data, and during the remaining 30%, the network goes down?

Since by default Sqoop uses JDBC, how does the data transfer happen at a network level? Can we compress the data in transit?

Raju Rama Krishna
  • 157
  • 1
  • 1
  • 3

0 Answers0