0

I have a project in spring batch where I must read from two .txt files, one has many lines and the other is a control file that has the number of lines that should be read from the first file. I know that I must use partition to process these files because the first one is very large and I need to divide it and be able to restart it in case it fails but I don't know how the reader should handle these files since both files do not have the same width in their lines. None of the files have a header or separator in their lines, so i have to obtain the fields according to a range mainly in the first one.

One of my doubts is whether I should read both in the same reader? And how should I set the reader FixedLengthTokenizer and DefaultLineMapper to handle both files in the case of using the same reader??

These are examples of the input file and the control file

- input file

09459915032508501149343120020562580292792085100204001530012282921883101

the txt file can contain up to 50000 lines

- control file

00128*

It only has one line

Thanks!

Cris
  • 15
  • 5
  • Can you provide an example for each file? What is the size of the "control file" and the input file? I'm asking before a common technique is to cache reference data (See https://stackoverflow.com/a/52644962/5019386) and use it in a regular chunk-oriented step (to avoid re-reading a control file or a reference table for each item). Besides that, what is your output? Do you really need partitioning? – Mahmoud Ben Hassine Jan 08 '20 at 10:09
  • @ Mahmoud Ben Hassine - The output is several txt files in which the lines of the input file are distributed according to a filter on them. As for the use of partition, a project began to be carried out by another person and who left the company, then the project was assigned to me, the person who carried it out initially had much more knowledge of spring batch than I recently I am starting to use it, and established the use of patition for the project,because of the need to be able to restart it in case of failure I also consider it the right option. – Cris Jan 08 '20 at 17:24

1 Answers1

0

I must read from two .txt files, one has many lines and the other is a control file that has the number of lines that should be read from the first file

Here is a possible way to tackle your use case:

  • Create a first step (tasklet) that reads the control file and put the number of lines to read in the job execution context (to share it with the next step)
  • Create a second step (chunk-oriented) with a step scoped reader that is configured to read only the number of lines calculated by the first step (get value from job execution context)

You can read more about sharing data between steps here: https://docs.spring.io/spring-batch/docs/4.2.x/reference/html/common-patterns.html#passingDataToFutureSteps

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • @ Mahmoud Ben Hassine Thank you very much for your answer. I will try to do it this way. – Cris Jan 09 '20 at 12:23