2

Using org.apache.commons.csv.CSVParser I am having a strange behavior.

I am trying to read, line by line, a csv file delimited by ; but my parser is skipping line for an unknown reason.

Here is my code:

public static void main(String[] args) {
    try (
        File file = new File("myFile.csv");
        Reader reader = new BufferedReader(new FileReader(file));
        CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT.withDelimiter(';'));
    ) {
        if (!parser.iterator().hasNext()) {
            throw new RuntimeException("The file is empty.");
        }
        while(parser.hasNext()) { //<----- This skip a line! 
            console.log(parser.iterator().next().get(0).trim());
        }
    }
}

So my console looks like:

line2
line4
line6
line8
line10
line12

etc...

So my problem is that the CSVParser is skipping a line on parser.hasNext() and it shouldn't.

Is my code wrong? I am pretty sure if I replace the parser with an ArrayList the iterator work as expected... Is this a known bug? If yes can you guys point to a work around or a better library?

Sebastien
  • 1,308
  • 2
  • 15
  • 39
  • 1
    Note that it is probably more readable to use a foreach loop : `for (CSVRecord csvRecord : parser)` – Arnaud Apr 16 '18 at 14:54

2 Answers2

2

The problem you have it that each iteration calls iterator(), which returns a NEW Iterator .

Things are getting weird past this point, since the iterator has a current field storing the current record, and of course the current record of a new iterator is null .

In that case it calls getNextRecord() from CSVParser (source code), thus skipping a line .

If you want to stick with the iterator, just re-use the same instance :

Iterator<CSVRecord> iterator = parser.iterator();

while(iterator.hasNext()) { 
    console.log(iterator.next().get(0).trim());
}
Arnaud
  • 17,229
  • 3
  • 31
  • 44
  • Yup this is working, thanks! Is this normal behavior from an iterator? Or is this just a wrong implementation? – Sebastien Apr 16 '18 at 15:08
  • 1
    I have no clue, but you don't have any reason to call multiple iterators to iterate over a single collection. The foreach loop `for (CSVRecord csvRecord : parser)` in the comments does just that under the hood, call the iterator and iterate with it . That's also why we prefer to use foreach statements. – Arnaud Apr 16 '18 at 15:13
  • 1
    Can't use the foreach loop because the code I copied is split in a ItemReader in SpringBatch so I need to return only one element at a time and the "loop" is handle by SpringBatch – Sebastien Apr 16 '18 at 15:22
-1

Well, by default, the parser considers the first line as the header (column definition), so it is skipped in the returned records. To include this line, you must prepare your formatting accordingly, using withSkipHeaderRecord.

EDIT: Sorry, I've read too fast. I thought only first line was skipped.

amanin
  • 3,436
  • 13
  • 17
  • 1
    Ok, but it doesn't explain why it skips every second line. If it did skip only one line I would have understand but it skips every second line! – Sebastien Apr 16 '18 at 15:02