0

I have a file that normally has the following format:

property_A_1@property_B_1@property_C_1@property_D_1
property_A_2@property_B_2@property_C_2@property_D_2
property_A_3@property_B_3@property_C_3@property_D_3

This should be mapped to a custom class with four properties, @ as a delimiter. However, there are occasions where the property_B might contain a new line as part of its characters, e.g.:

property_A_1@property_B_1@property_C_1@property_D_1
property_A_2@property_B_2_i
property_B_2_ii
property_B_2_iii
property_B_2_iiii@property_C_2@property_D_2
property_A_3@property_B_3@property_C_3@property_D_3

The number of these lines can vary and are not fixed. In this case, I still need to map the second entry as before, except that property_b_2's should contain the data between the first @ and the second @.

I can live with no new line if I can replace them with spaces, so as if the actual entry hypothetically looks like:

property_A_2@property_B_2_i property_B_2_ii property_B_2_iii@property_B_2_iiii@property_C_2@property_D_2

Is there a way to accomplish this with ItemReader and LineMapper?

Malvon
  • 1,591
  • 3
  • 19
  • 42
  • Does this answer your question? [Reading line breaks in CSV which are quoted in the file in FlatfileItemReader of spring batch](https://stackoverflow.com/questions/29509074/reading-line-breaks-in-csv-which-are-quoted-in-the-file-in-flatfileitemreader-of) – Mahmoud Ben Hassine Jul 16 '20 at 07:27
  • Hi Mahmoud. Unfortunately, switching to `DefaultRecordSeparatorPolicy` doesn't help as it still considers records when it reaches the end of the line and is terminated with quotes. What I need a separator policy that instead counts number of `@`s since it's a fix number, e.g. 3 (of course, also considering the property after the last `@` as well [see example above]). That said, I believe the key is to override `DefaultRecordSeparatorPolicy#preProcess()` to do a custom `isContinued`. However, I'm not sure how to keep track of `@` with this approach. – Malvon Jul 16 '20 at 12:12
  • The default continuation character is back slash. Have you set it to "\n"? See https://docs.spring.io/spring-batch/docs/4.2.x/api/org/springframework/batch/item/file/separator/DefaultRecordSeparatorPolicy.html#setContinuation-java.lang.String-. Please note that the default policy is for default use cases. You might need a custom record separator policy for your specific use case. – Mahmoud Ben Hassine Jul 16 '20 at 14:45

1 Answers1

0

I solved this by overriding DefaultRecordSeparatorPolicy#isEndOfRecord() of my ItemReader. I also needed to get read of unterminated quotations check as the content might have an uneven pair of quotation characters:

itemReader.setRecordSeparatorPolicy(new DefaultRecordSeparatorPolicy() {

    private static final String CONTINUATION = "\\";

    private String continuation = CONTINUATION;
    private String final String delimiter ="@";

    @Override
    public boolean isEndOfRecord(String line) {
        return StringUtils.countOccurrencesOf(line, delimiter) >=3 &&
               !isQuoteUnterminated(line) && 
               !isContinued(line);
    }

    private boolean isQuoteUnterminated(String line) {
        return false;
    }

    private boolean isContinued(String line) {
        if (line == null) {
            return false;
        }
        return line.trim().endsWith(continuation);
    }
});
Malvon
  • 1,591
  • 3
  • 19
  • 42