2

How to stop parsing after reading few rows from CSV file using iterator/row processor in univocity parser?

Update #1

I tried the below code and I'm getting empty rows.

val parserSettings = new CsvParserSettings
parserSettings.detectFormatAutomatically()
parserSettings.setEmptyValue("")
parserSettings.setNumberOfRecordsToRead(numberOfRecordsToRead)

val parser = new CsvParser(parserSettings)
val input = new FileInputStream(path)
val rows = parser.parseAll(input)

Update #2

Before passing inputstream to parser, I was using Apache Tika to detect the MIME type of the file to detect whether the file is CSV.

new Tika().detect(input)

This was altering the inputstream. Due to that Univocity parser was unable to parse correctly.

Gowrav
  • 627
  • 7
  • 22

1 Answers1

1

You have many different options:

  1. From your row processor just call context.stop().

  2. On the parser settings, you can set settings.setNumberOfRecordsToRead(10) to read 10 rows and stop.

  3. With the parser itself, call parser.stopParsing()

Hope this helps

Jeronimo Backes
  • 6,141
  • 2
  • 25
  • 29
  • Thanks for the reply. Updated the question with answer based on your solution no.2.(Update #1). Not working. Am I missing something? – Gowrav Dec 13 '17 at 18:43
  • It should work just fine. In all tests I ran the `rows` returned by the parser have at most the number of records set in the configuration. Are you using the latest version? – Jeronimo Backes Dec 14 '17 at 02:09
  • Yes, using `2.5.9` – Gowrav Dec 14 '17 at 08:50
  • Very weird. Can you submit a sample of the input you are trying to parse so I can try to reproduce this? – Jeronimo Backes Dec 14 '17 at 12:47
  • I created a sample project and tested the functionality. It works. Find out the cause of the issue and updated in question(Update #2). Thank you for your time. – Gowrav Dec 14 '17 at 15:22
  • There is one issue. Even after using `setEmptyValue("")`, null is returned for missing values in a cell. [Sample Code](https://gist.github.com/gowravshekar/f19170e52148e17f4b20e4ec4f829f3d). Last cells of row 8 & 9. – Gowrav Dec 14 '17 at 17:36
  • I believe you are looking for `setNullValue("")`. – Jeronimo Backes Dec 15 '17 at 01:32