0

Background/Context

I see almost countless examples of how to process multiple files using Spring Batch, but every single on of them has a single object that all the files are being processed into. So, many files containing compatible data, that are all being processed into a single destination target, like a database table, for instance.

I want to build an import process that will take in ten different files and map them to ten different destination tables in the same database/schema. The filenames will also change slightly in a predictable/code-able fashion every day, but I think I'll be able to handle that. I thought Spring could do this (a many-to-many data mapping), but this is the last thing I'm not finding HOW to do. The declarative structure of Spring is great for some things, but I'm honestly not sure how to set up the multiple mappings, and since there's really no procedural portion of the application to speak of, I can't really use any form of iteration. I could simply make separate jars for each file, and script the iteration on the console, but that also complicates logging and reporting... and frankly it sounds hacky

Question

How do I tell Spring Batch to process each of ten different files, in ten different ways, and map their data into ten different tables in the same database?

Example:

  • File Data_20190501_ABC_000.txt contains 4 columns of tilde-delimited data and needs to be mapped to table ABC_data with 6 columns (two are metadata)
  • File Data_20190501_DEF_000.txt contains 12 columns of tilde-delimited data and needs to be mapped to table DEF_data with 14 columns (two are metadata)
  • File Data_20190501_GHI_000.txt contains 10 columns of tilde-delimited data and needs to be mapped to table GHI_data with 12 columns (two are metadata)
  • etc... for ten different files and tables

I can handle the tilde delimiting, I THINK I can handle the dates in the file names programmatically, and one of the fields can be handled in a db trigger. the other metadata field should be the file name, but that can certainly be a different question.

UPDATE

According to what I think Mahmoud Ben Hassine suggested, I made a separate reader, mapper, and writer for each file/table pair and tried to add them with the start(step1), next(step2), build() paradigm in the format below as based on the examples at Configuring and Running a Job from Spring's docs:

@Autowired
private JobBuilderFactory jobs;

@Bean
public Job job(@Qualifier("step1") Step step1, @Qualifier("step2") Step step2) {
    return jobs.get("myJob").start(step1).next(step2).build();
}

Either step runs independently, but once I add one in as the "next" step, it only executes the first one, and generates a "Step already complete or not restartable, so no action to execute" INFO message in the log output - where do I go from here?

Code Jockey
  • 6,611
  • 6
  • 33
  • 45

1 Answers1

1

A chunk-oriented step in Spring Batch can handle only one type of items at a time. I would use a job with different chunk-oriented steps in it. These steps can be run in parallel as there is no relation/order between input files.

Most of the configuration would be common in your case, so you can create an abstract step definition with common configuration properties, and multiple steps with specific properties for each one of them (in your case, I see it should be the file name, field set mapper and the target table).

halfer
  • 19,824
  • 17
  • 99
  • 186
Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • I've done what I THINK you're suggesting, and everything compiles, and it seems to recognize there are 2 steps, but it only executes the first step. please see update in my question – Code Jockey May 01 '19 at 20:25
  • 1
    Your job definition will execute steps in sequence. You can run them in parallel, see how to do it in this section: https://docs.spring.io/spring-batch/4.1.x/reference/html/scalability.html#scalabilityParallelSteps. In regards to the info message `Step already complete..`, please note that by default, a step is not re-executed if it was complete in the previous run, You can override this behaviour and make it run each time by setting the `allowStartIfComplete` flag. – Mahmoud Ben Hassine May 01 '19 at 21:13