-2

In the snippet below I try to read an excel file by using the CSVParser from the Apache Commons library. The question is why records.getRecords(); makes the list of records empty. How should I be aware of this behavior?

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;

import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.List;

public class ReadCSV {

    public ReadCSV() {
    }

    /* Define headers as enum */
    enum HEADER {
        ID, NAME, AGE
    }

    public List<List<String>> ReadCSVToList(String csvPath) throws IOException, HighBalanceException {
        List<List<String>> csvList = new ArrayList<>();
        try {


            Reader reader = new FileReader(csvPath);
            CSVParser  records = CSVFormat.DEFAULT.withHeader(HEADER.class).parse(reader);
            List<CSVRecord> records1 = records.getRecords();
            System.out.println(records1.size()); // 2
            List<CSVRecord> records2 = records.getRecords();
            System.out.println(records2.size()); // 0
gru
  • 2,319
  • 6
  • 24
  • 39
Mohammad
  • 41
  • 5
  • 6
    *how should I be aware of this behavior* - read the documentation for all classes you use in your program. – passer-by Jan 29 '22 at 17:36

2 Answers2

5

It helps to read the documentation of CSVParser:

Parses CSV files according to the specified format. [...] The parser works record wise. It is not possible to go back, once a record has been parsed from the input stream.

And a few paragraphs later, under the heading "Parsing into memory":

If parsing record wise is not desired, the contents of the input can be read completely into memory.

Reader in = new StringReader("a;b\nc;d");
CSVParser parser = new CSVParser(in, CSVFormat.EXCEL);
List<CSVRecord> list = parser.getRecords();

There are two constraints that have to be kept in mind:

  1. Parsing into memory starts at the current position of the parser. If you have already parsed records from the input, those records will not end up in the in memory representation of your CSV data.
  2. Parsing into memory may consume a lot of system resources depending on the input. For example if you're parsing a 150MB file of CSV data the contents will be read completely into memory.

When you call records.getRecords() the first time you are reading the CSV file completely into memory. That together with the fact that "parsing into memory starts at the current position of the parser" means that for the second call there are no more records to parse (because the parser has already read the file completely.)

Thomas Kläger
  • 17,754
  • 3
  • 23
  • 34
4

As you can read in the official docs: CSVParser#getRecords

The returned content starts at the current parse-position in the stream.

In your first call of getRecords, the parsing position is at the beginning of the stream. When you call it the second time, the stream end is already reached.

In general, I would always recommend you to start with the docs. Often, such questions can be easily answered with just little reading. If there are still confusing aspects, the community is of course happy to help you further.

gru
  • 2,319
  • 6
  • 24
  • 39
  • sorry if I am asking a bad question I'm new to java , how can I know that it is working on streams "CSVFormat.DEFAULT.withHeader(HEADER.class).parse(reader);" – Mohammad Jan 30 '22 at 13:02
  • At StackOverflow, the community often responds with "read the docs", because that's in fact mostly answering such questions. :) – gru Jan 30 '22 at 19:36
  • A "stream" means in this context an input stream, i.e. the stream of data from your csv file. As explained in the docs, there is a cursor which moves when you access data. – gru Jan 30 '22 at 19:39
  • When you repeatedly access records, the cursor moves ahead in the stream. At some point, it will reach the end and you cannot read further data. – gru Jan 30 '22 at 19:39