1

Use Case: Ingest transaction data (e.g. rows = 10,000) in a single batch from DB2 and insert them to a Vertica database.

Question: Should I get a single row from database or batch of 10k rows, process and then insert into destination database? Is there any sample code which reads from one database and writes into another database?

2 Answers2

0

You should always prefer batch execution , you will minimized your network roundtrip and improved your load to Vertica .

elirevach
  • 404
  • 2
  • 7
0

You can use the JDBC input and output operators to fetch from origin database and destination database. They should have configurable batch sizes. In general batching is faster than tuple by tuple.

Check https://github.com/apache/incubator-apex-malhar/tree/master/library/src/main/java/com/datatorrent/lib/db/jdbc

You can add multiple XML configuration files at src/site/conf in your project and select one of them at launch time. This is described briefly at http://docs.datatorrent.com/application_packages/ under the section entitled "Adding pre-set configurations"