spring batch PatternMatchingCompositeTokenizer when there are more than 1 patterns

Question

I have a file to read like below. A record is split into multiple lines. Each record can have any number of lines; the only way to recognize a new record is when a line starts with "ABC" and there is another line with identifier ABC_SUB. Each line for this record need to have a separate mapper identified by the pattern with the start of the line(e.g., ABC, line2, line3, line, ABC_SUB, line3, line4) The same pattern line3 and line4 can be present, but they need different mappers based on the previous line type identifier. In this example,

line3 pattern(with mapper1) exists after a line that starts with ABC and
line3 pattern(with mapper2) exits after a line that starts with ABC_SUB.

How to identify a line3 pattern under ABC vs. a line3 pattern under ABC_SUB?

I tried PatternMatchingCompositeTokenizer, but this gives the first matching mapper.

Is there a way to check a few lines before identifying a subtype(like ABC or ABC_SUB) and giving the respective mapper?

HDR
ABCline1goesonforrecord1   //record starts
line2goesonForRecord1      
line3goesonForRecord1       //this requires ABC_line3_mapper    
line4goesonForRecord1
ABC_SUBline1goesonforrecord1  //sub-record where it can have same pattern in below lines
line3goesonForRecord1       //this requires ABC_SUB_line3_mapper  
line4goesonForRecord1
ABCline2goesOnForRecord2  //record 2 begins
line2goesonForRecord2
line3goesonForRecord2
line4goesonForRecord2
line5goesonForRecord2
ABCline2goesOnForRecord3
line2goesonForRecord3
line3goesonForRecord3
line4goesonForRecord3
TRL

Below is the XML config

<batch:job id="importFileData">
        <batch:step id="parseAndLoadData">
            <batch:tasklet>
                <batch:chunk reader="multiLineReader" writer="writer"
                    commit-interval="5" skip-limit="100">
                    <batch:streams>
                        <batch:stream ref="fileItemReader" />
                    </batch:streams>
                </batch:chunk>

            </batch:tasklet>
        </batch:step>

    </batch:job>


    <bean id="fileItemReader"
        class="org.springframework.batch.item.file.FlatFileItemReader"
        scope="step">
        <property name="resource" value="classpath:input/input.txt"></property>
        <property name="linesToSkip" value="2" />
        <property name="lineMapper">
            <bean
                class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
                <property name="lineTokenizer" ref="customLineTokenizer">
                </property>
                <property name="fieldSetMapper">
                    <bean
                        class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper">


                    </bean>
                </property>

            </bean>
        </property>

    </bean>


    <bean id="reader" class="sample.MultiLineReader">
        <property name="fieldSetReader" ref="fileItemReader" />
        <property name="abcMapper" ref="abcMapper" />
        <property name="123Mapper" ref="sample123Mapper" />
    </bean>

    <bean id="orderFileTokenizer"
        class="org.springframework.batch.item.file.transform.PatternMatchingCompositeLineTokenizer">
        <property name="tokenizers">
            <map>
                <entry key="ABC*" value-ref="abcLineTokenizer" />
                <entry key="123*" value-ref="123LineTokenizer" />
            </map>
        </property>
    </bean>

    <bean id="abcLineTokenizer"
        class="org.springframework.batch.item.file.transform.FixedLengthTokenizer">
        <property name="names" value="NAME, AGE, GENDER" />
        <property name="columns" value="1-10,11-15,16-20" />
        <property name="strict" value="false" />
    </bean>

    <bean id="123LineTokenizer"
        class="org.springframework.batch.item.file.transform.FixedLengthTokenizer">
        <property name="names" value="CONTACT, ALT_CONTACT" />
        <property name="columns" value="1-15,16-30" />
        <property name="strict" value="false" />
    </bean>
    
    <bean id="abcMapper" class="sample.ABCMapper" />
    <bean id="sample123Mapper" class="sample.123Mapper" />

</beans>

It is not clear from what you describe what you consider as an item. If I understand correctly, an item is represented by all records starting with `ABC` until the last line before the next `ABC` (including intermediate ABC_SUB lines). According to you, you need two mappers to create a single item (ABC_line3_mapper and ABC_SUB_line3_mapper). I think it's easier for you to write a custom mapper. — Mahmoud Ben Hassine, Nov 26 '20 at 13:16
Hi Mahmoud, thanks for checking. I tried editing the question to add more details to be clear. Problem is that same pattern exists(for eg pattern: line3) twice for a given record. But based on what is previous line identifier(eg ABC and ABC_SUB are previous line identifiers) different mappers to be used for lines that start with same pattern(line3) — Teja, Nov 26 '20 at 22:53
That's a very specific use case. To my knowledge, nothing in Spring Batch supports that out of the box. You need to create a custom mapper (and probably reader). — Mahmoud Ben Hassine, Nov 27 '20 at 10:07
Thanks for your reply Mahmoud. I will try writing some custom logic for this. — Teja, Nov 27 '20 at 20:13

spring batch PatternMatchingCompositeTokenizer when there are more than 1 patterns

0 Answers0