11

I am trying to parse a CSV file with FlatFileItemReader. This CSV contains some quoted newline characters as shown below.

email, name
abc@z.com, "NEW NAME
 ABC"

But this parsing is failing with required fields are 2 but actual is 1.

What I am missing in my FlatFileReader configuration?

<property name="lineMapper">
            <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">

                <!-- The lineTokenizer divides individual lines up into units of work -->
                <property name="lineTokenizer">
                    <bean
                        class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">

                        <!-- Names of the CSV columns -->
                        <property name="names"
                            value="email,name" />
                    </bean>
                </property>

                <!-- The fieldSetMapper maps a line in the file to a Product object -->
                <property name="fieldSetMapper">
                    <bean
                        class="com.abc.testme.batchjobs.util.CustomerFieldSetMapper" />
                </property>
            </bean>
        </property>
Bilbo Baggins
  • 2,899
  • 10
  • 52
  • 77
  • [email, name..., "quoted..."] would lead to 3 values, if i use [email, "name,something"] it works as espected, because the quotecharacter just says "ignore a linie delimiter inside quotation marks", what do you expect? – Michael Pralow Apr 08 '15 at 09:00
  • @MichaelPralow I want to parse the above shown CSV file. – Bilbo Baggins Apr 08 '15 at 09:08
  • Removed unused configuration – Bilbo Baggins Apr 08 '15 at 09:08
  • What I got to know after debug is that my BufferedReader reads till it encounters a new line character. After that it stops reading. While the data that I have posted here is still on a new line. Is there a way to parse such CSV files with spring-batch FlatFileItemReader? – Bilbo Baggins Apr 08 '15 at 09:16
  • is each logical line distributed on 2 lines? maybe http://stackoverflow.com/questions/9939851/spring-batch-how-to-process-multi-line-log-files gives you a starting point – Michael Pralow Apr 08 '15 at 12:19
  • 1
    I don't see anything obviously wrong with your configuration and we have a unit test for this very scenario (https://github.com/spring-projects/spring-batch/blob/master/spring-batch-infrastructure/src/test/java/org/springframework/batch/item/file/transform/DelimitedLineTokenizerTests.java#L311) so I know it works. You only show the configuration for your `LineMapper` here...are you using any type of custom `RecordSeparatorPolicy`? – Michael Minella Apr 08 '15 at 14:30
  • It should have worked out of the box. The default delimiter is the double quote character " According to the document it should have skipped line endings --> Convenient constant for the common case of a " character used to escape delimiters or line endings. – Saifuddin Merchant Apr 08 '15 at 18:27
  • made some quick tests, seems to be that the tokenizer is not the root of the problem, but the flatfileitemreader (or component inside it) – Michael Pralow Apr 08 '15 at 19:49
  • I debugged the code and find out taht the reader in it reads till it encounters a new line character. – Bilbo Baggins Apr 09 '15 at 05:13

3 Answers3

18

out of the box the FlatFileItemReader uses a SimpleRecordSeparatorPolicy, for your usecase

  • commented part goes over 2 or more lines

you need to set the DefaultRecordSeparatorPolicy

Cited from its javadoc:

A RecordSeparatorPolicy that treats all lines as record endings, as long as they do not have unterminated quotes, and do not end in a continuation marker.

example xml configuration

<bean id="reader" 
      class="org.springframework.batch.item.file.FlatFileItemReader">
      ...
    <property name="recordSeparatorPolicy">
        <bean class="org.springframework.batch.item.file.separator.DefaultRecordSeparatorPolicy" />
    </property>
      ...
</bean>
Michael Pralow
  • 6,560
  • 2
  • 30
  • 46
  • 1
    Thank you very much. I did read about RecordSeparatorPolicy but must have missed this part of unterminated quotes – Bilbo Baggins Apr 09 '15 at 05:15
  • Thanks , this worked. In case if anyone looking for Java configuration , flatFileItemReader.setRecordSeparatorPolicy(new DefaultRecordSeparatorPolicy()); – Jijil Kakkadathu Apr 26 '23 at 05:55
2
itemReader.setRecordSeparatorPolicy(new DefaultRecordSeparatorPolicy());
saurabh gupta
  • 491
  • 6
  • 18
0

Complete bean configuration for reading csv file with qouted new line characters

@Bean
public FlatFileItemReader<Object> flatFileItemReader() {

    FlatFileItemReader<Object> reader = new FlatFileItemReader<>();
    reader.setResource(new FileSystemResource("resource.csv"));
    
    // Add this line to your code
    reader.setRecordSeparatorPolicy(new DefaultRecordSeparatorPolicy());
   
    return reader;
}