1

Working on a small example where have to copy millions of records from teradata database to Oracle DB.

Environment: Spring Boot + Spring JDBC (jdbcTemplate) + Spring REST + Spring Scheduler + Maven + Oracle + Teradata

Using Spring JDBC's batchUpdate to insert data into target Database Oracle.

Using teradata's 'top 1000' in SQL query from source database.

fecthDataResults = repository.queryForList(
                "select top 1000 A, B, C, D, E from " + schemaName + ".V_X");

Querying from a View "V_X".

This View has 40 million records and spring boot application will choke if it runs.

Also inserting into 2 tables (Primary and Backup) in target Oracle DB.

Whats the best way to fetch and load/copy 40 million records making sure that copying was done successfully into 2 tables.

Spring Scheduler to schedule the batch copy at specified time/interval. Spring Rest to invoke copying manually - both of which is achieved.

Any suggestions would be appreciated.

Thanks.

Manish
  • 11
  • 1
  • 4
  • 2
    I would delegate that to native `imp/exp` scripts...but if you are required to do it in Java, Spring Batch + Spring JDBC operations (`select top X into Y`) could be an option . – Carlos Quijano Jun 28 '17 at 18:56
  • Export data from Teradata (into CSV or some other format) -> transform -> import directly into oracle (using tools like SQL*Loader utility) an option for you? This way you bypass the spring application – Srinivas Jun 28 '17 at 18:58
  • Thanks @Carlos. using Top X (1000) into Y itself. Have to check how to work with 36 million records now. – Manish Jun 28 '17 at 21:02
  • @Sninivas - the example is working fine for less number of records directly from database to database - can't use an SQL Loader - actually using spring scheduler which is also working.. – Manish Jun 28 '17 at 21:02
  • @Manish Why not connect to Teradata from [Oracle](http://www.dba-oracle.com/t_linking_oracle_with_teradata.htm) and create a stored procedure in Oracle database to fetch the data using Bulk Collect. Finally, invoke the Oracle stored procedure from Java or Spring. – Jacob Jun 29 '17 at 03:12
  • Are you supposed to use only Spring-boot or you're open to any other suggestions? Is it a one-time operation or a continuous job? – zeagord Jun 29 '17 at 03:22
  • @mephis-slayer - as of now I've already coded using Spring boot + Spring JDBC + scheduler and rest controllers... Spring boot is just to trigger and create a war file - can always change but can't change Spring JDBC queries.. using jdbcTemplate's queryForList and batchUpdate to insert.. Also it's like a one time operation which will run may be every week in the night - scheduled - using spring cron job expressions – Manish Jun 29 '17 at 03:42
  • @user75ponic - they are two different database from different environment/platforms and in other higher environment or in production can't/won't get those writes/access.. have to copy tables using scheduler or manually trigger them - will have only read access – Manish Jun 29 '17 at 04:03
  • @Manish I assume that you have write access to Oracle database correct me if I am wrong. If this is case read the Teradata tables from Oracle using a stored procedure. – Jacob Jun 29 '17 at 04:15

2 Answers2

2

There are different ways you can solution this:

  1. Logstash Approach - Specify your source and destination data and load the data to both the destination DBs. It has cron support and the logstash can run based on the schedule. It is quite faster. You can specify how many rows you wanna fetch every time.

  2. Use an ETL tool. You can go with any of the open source versions if you do have the ecosystem in place. Talend is a good candidate where you can design your job and export as runnable Jar. You can schedule this by using any component of your choice.

  3. Spring Batch. Please refer this question. Spring RESTful web services - High volume data processing

  4. Spring Cloud Data Flow or Spring boot with a MQ as an intermediate store between your datasources. You may have to introduce message queues to handle failover, fallback mechanisms. Highly reliable and can implemented in a async manner.

My personal opinion is to go with Logstash. If you feel any of the above solutions make sense. I can elaborate them if you want.

zeagord
  • 2,257
  • 3
  • 17
  • 24
0

well in base of the information you give and following the chosen stack, in my opinion you have two possibilities, first create a project with spring batch in addition with spring batch admin or spring integration to deal with rest. The second in fact uses the first solution in a big data concept using Spring XD I recommend you use a profiler to speed up the performance as much as you could.

nekperu15739
  • 3,311
  • 2
  • 26
  • 25