2

I want to process bulk amount of XML data and saves it into database. Which is the best option. Spring batch kettle pentaho? I have some checkpoints.

  1. Tool is good when schema is known
  2. Supports Parallel execution, multiple sessions and error log
  3. Faster, less memory and less CPU utilization
  4. Supports both inserts and updates
  5. Foreign key references for target tables, dropping constraints and add after data load
  6. Eliminate duplications
  7. block or batch load support
  8. headless execution (no-gui for schedule and start)
  9. Support multiple input formats
  10. Support custom data transformation as pluggable components
  11. Transaction control, error handling and logging for future execution
  12. Inspecting the Status of the Jobs, Monitoring
  13. Integration testing, Sanity testing
  14. Scalable, how to load multiple node in parallel
  15. Restart Jobs when they crash, automatic restart after failure
  16. Tracking Status and Statistics during execution
  17. Ability to launch through web or Rest interfaces
rojo
  • 24,000
  • 5
  • 55
  • 101
Anuraj
  • 2,551
  • 21
  • 26
  • The requirements in itself sound like a project requirement on its own. Many of them are database or core-logic related. I am not sure you will anything which matches so much of your requirements. – We are Borg Nov 26 '15 at 10:12

1 Answers1

3

I will try to address your points with Spring Batch capabilities :

  1. Tool is good when schema is known

This is the case with Spring batch. You will be able to use a StaxEventItemReader which requires an annoted bean (known schema).

  1. Supports Parallel execution, multiple sessions and error log

Spring batch supports Parallel execution and error logging. I'm not sure what you mean by multiple sessions. Here are some info about spring batch scalability.

  1. Faster, less memory and less CPU utilization

Spring batch performances depends a lot on how you will use it. Although it may not be the fastest or more efficient, it is used in many production environment across the world.

  1. Supports both inserts and updates

Spring Batch database writers support common DBMS with such operations (JdcbBatchItemWriter, HibernateItemWriter...)

  1. Foreign key references for target tables, dropping constraints and add after data load

I think this will need some manual implementation, but I'm not sure since I haven't met the requirement as of today.

  1. Eliminate duplications

This will be done in your ItemProcessor. Here's an example : processing batch of records using spring batch before writing to DB

  1. block or batch load support

You can configure your writer's commit-interval and the rollback operations with Spring Batch.

  1. headless execution (no-gui for schedule and start)

Spring Batch can be started with a CommandLineJobRunner or any other way with a JobLauncher (requiring then some manual implementation)

  1. Support multiple input formats

Spring Batch can read any kind of flat file (FlatFileItemReader), xml file (StaxEventItemReader), queue (JmsItemReader) or database (JdbcCursorItemReader).

  1. Support custom data transformation as pluggable components

Data transformation is achieved through ItemProcessor. There are out-of-the-box implementations, but most often you will have to write you own implementation to apply your custom logic. As for pluggable components, I'm not sure what you mean.

  1. Transaction control, error handling and logging for future execution

Spring Batch has a whole Retry mechanism and Restartability. You can read more here and here.

  1. Inspecting the Status of the Jobs, Monitoring

Spring Batch allows you to configure where you store metadata about job status (database, file, RAM...). You will be able to read these data. There is also a second project called spring-batch-admin which is a GUI for monitoring and control. Read more here.

  1. Integration testing, Sanity testing

Can't answer that.

  1. Scalable, how to load multiple node in parallel

See 11. Also Spring Batch can be integrated with Spring-XD.

  1. Restart Jobs when they crash, automatic restart after failure

See 11.

  1. Tracking Status and Statistics during execution

See 12.

  1. Ability to launch through web or Rest interfaces

Spring Batch can be integrated with Spring-Boot to answer these needs.


I hope I answered some of your concerns.

Community
  • 1
  • 1
Thrax
  • 1,926
  • 1
  • 17
  • 32