4

I have somewhat of a larger csv file approximately 80K to 120K rows (depending on the day). I'm successfully running the code which parses the entire csv file into a java object using @CsvBindByName annotation. Sample code:

Reader reader = Files.newBufferedReader(Paths.get(file));
    CsvToBean csvToBean = new CsvToBeanBuilder<Object>(reader)
            .withType(MyCustomClass.class)
            .withIgnoreLeadingWhiteSpace(true)
            .build(); 
    List<MyCustomClass> myCustomClass= csvToBean.parse();`

I want to change this code to parse the csv file line by line instead of entire file but retain the neatness of mapping to java bean object. Essentially something like this:

    CSVReader csvReader = new CSVReader(Files.newBufferedReader(Paths.get(csvFileLoc)));
    String[] headerRow = csvReader.readNext(); // save the headerRow
    String [] nextLine = null;
    MyCustomClass myCustomClass = new MyCustomClass(); 
    while ((nextLine = csvReader.readNext())!=null) {
                    myCustomClass.setField1(nextLine[0]);
                    myCustomClass.setField2(nextLine[1]);
                    //.... so on 
                }

But the above solution ties me to knowing the column positions for each field. What I would like is to map the string array I get from csv based on the header row similar to what opencsv does while parsing the entire csv file. However, I am not able to do that using opencsv, as far as I can tell. I had assumed this would be a pretty common practice but I am unable to find any references to this online. It could be that I am not understanding the CsvToBean usage correctly for opencsv library. I could use csvToBean.iterator to iterate over the beans but I think entire csv file is loaded in memory with the build method, which kind of defeats the purpose of reading line by line. Any suggestions welcome

Core7s
  • 147
  • 4
  • 13
  • Possible duplicate of [Read streaming data from csv using OpenCSV](https://stackoverflow.com/questions/39673372/read-streaming-data-from-csv-using-opencsv) – jkinkead Apr 20 '18 at 16:52
  • @jkinkead not really, that question is more about how to iterate the reader. Mine is essentially about how to map a string array to Java object without being dependent on the location of elements in the array – Core7s Apr 20 '18 at 16:57
  • Looking at the API docs (I've never used opencsv before) it says it uses a `HeaderColumnNameMappingStrategy` behind the scenes when you use `@CsvBindByName` (if I'm reading it right). Maybe you can use one of those manually? – David Conrad Apr 20 '18 at 17:10
  • Oh, sorry, I didn't read the last part of your question. Are you sure it's all ready into memory in the `build()` method, and not the `parse()` method? That seems really unlikely; that's not how the builder pattern normally works, and it would definitely violate the principle of least astonishment. – David Conrad Apr 20 '18 at 17:18
  • Sorry about not fully reading your question at first and implying you hadn't investigated the possibility of using the iterator. On further investigation it looks like it is safe to use it. – David Conrad Apr 20 '18 at 17:28

1 Answers1

6

Looking at the API docs further, I see that CsvToBean<T> implements Iterable<T> and has an iterator() method that returns an Iterator<T> that is documented as follows:

The iterator returned by this method takes one line of input at a time and returns one bean at a time.

So it looks like you could just write your loop as:

for (MyCustomClass myCustomClass : csvToBean) {
    // . . . do something with the bean . . .
}

Just to clear up some potential confusion, you can see in the source code that the build() method of CsvToBeanBuilder just creates the CsvToBean object, and doesn't do the actual input, and that the parse() method and the iterator of the CsvToBean object each do perform input.

David Conrad
  • 15,432
  • 2
  • 42
  • 54
  • I suspected my understanding of the `buid()` method was wrong. Looks like I will be fine using `iterator()`. Thanks for your feedback! – Core7s Apr 20 '18 at 18:13