My requirement is to
- Move data from Oracle to HDFS
- Process the data on HDFS
- Move processed data to Teradata.
It is also required to do this entire processing every 15 minutes. The volume of source data may be close to 50 GB and the processed data also may be the same.
After searching a lot on the internet, i found that
- ORAOOP to move data from Oracle to HDFS (Have the code withing the shell script and schedule it to run at the required interval).
- Do large scale processing either by Custom MapReduce or Hive or PIG.
- SQOOP - Teradata Connector to move data from HDFS to Teradata (again have a shell script with the code and then schedule it).
Is this the right option in the first place and is this feasible for the required time period (Please note that this is not the daily batch or so)?
Other options that i found are the following
- STORM (for real time data processing). But i am not able to find the oracle Spout or Teradata bolt out of the box.
- Any open source ETL tools like Talend or Pentaho.
Please share your thoughts on these options as well and any other possibilities.