How can OpenCSV be used to parse multiline records?

Question

I am trying to parse a file similar to this using OpenCSV -

CUST,Warren,Q,Darrow,8272 4th Street,New York,IL,76091
TRANS,1165965,2011-01-22 00:13:29,51.43
CUST,Erica,I,Jobs,8875 Farnam Street,Aurora,IL,36314
TRANS,8116369,2011-01-21 20:40:52,-14.83
TRANS,8116369,2011-01-21 15:50:17,-45.45
TRANS,8116369,2011-01-21 16:52:46,-74.6
TRANS,8116369,2011-01-22 13:51:05,48.55
TRANS,8116369,2011-01-21 16:51:59,98.53

I will use Customer object to read the records starting with 'CUST'. The Customer object will contain a List of transactions.

public class Customer {
      private String firstName;
      private String middleInitial;
      private String lastName;
      private String address;
      private String city;
      private String state;
      private String zipCode;
      List<Transaction> transactions;
      ...
}

I will use Transaction object to read the records starting with 'TRANS'.

public class Transaction {
    private String accountNumber;
    private Date transactionDate;
    private Double amount;
    ...
}

One Customer can have one or more Transaction. Although, I am able to use CSVReader to achieve this. Can I achieve the same using Annotations?

Yes, there is a way to achieve this functionality. Yes, OpenCSV can be used to parse multiline records. --- *FYI:* [Asking “Is there any way to…” is a poorly worded question](https://softwareengineering.meta.stackexchange.com/q/7273/202153), because, as you can see, I just answered it fully, and it was very helpful, was it? — Andreas, Dec 29 '20 at 14:41
*"How?"* You look at the **documentation**, e.g. the javadoc of the [`CSVReader`](http://opencsv.sourceforge.net/apidocs/com/opencsv/CSVReader.html), to see if there is a method that would seem useful. Seems `for (String[] record : csvReader) { /*process record here*/ }` might work. — Andreas, Dec 29 '20 at 14:46
*"Can anyone help me"* is basically asking us to write your code for you. What have you tried? What is stopping you from getting started? Did you do your due diligence, i.e. did you do any **research**, such as follow an [OpenCSV tutorial](https://www.google.com/search?q=opencsv+tutorial)? — Andreas, Dec 29 '20 at 14:49
Please see: [Why is “Can someone help me?” not an actual question?](http://meta.stackoverflow.com/q/284236) — Stephen C, Dec 29 '20 at 14:53
Thank you for guiding me in writing an effective question. Earlier I was trying to read into beans using annotations, http://opencsv.sourceforge.net/#reading_into_beans Probably OpenCSV does not have this feature for a file like this. As suggested by @Andreas, I can use CSVReader and use setters to populate the objects. — Gourav Dey, Dec 29 '20 at 15:27

score 0 · Answer 1 · answered Dec 29 '20 at 16:08

CSV files are lists, right? Well, some people like lists within lists.

From the documentation

It seems that OpenCSV can only handle sublists within a single "physical" CSV record, and there seem nothing that can handle your case. However, if you can parse the input CSV document record by record, you can organize the parsing into groups parsing so that once a group is ready, you can deserialize it yourself.

For example,

public static Stream<List<String[]>> readGroups(@WillClose final CSVReader csvReader, final Predicate<? super String[]> isGroupStart,
        final Predicate<? super String[]> isGroupSpan) {
    final Spliterator<List<String[]>> spliterator = new Spliterators.AbstractSpliterator<List<String[]>>(Long.MAX_VALUE, Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
        @Override
        public boolean tryAdvance(final Consumer<? super List<String[]>> action) {
            try {
                final String[] head = csvReader.readNextSilently();
                if ( !isGroupStart.test(head) ) {
                    throw new IOException("First record must delimit a group start");
                }
                final List<String[]> buffer = new ArrayList<>();
                buffer.add(head);
                @Nullable
                String[] peeked;
                while ( (peeked = csvReader.peek()) != null && !isGroupStart.test(peeked) ) {
                    if ( !isGroupSpan.test(peeked) ) {
                        throw new IOException("Not a group span");
                    }
                    csvReader.readNextSilently(); // discard the "peeked" state
                    buffer.add(peeked);
                }
                action.accept(buffer);
                return peeked != null;
            } catch ( final IOException ex ) {
                throw new UncheckedIOException(ex);
            }
        }
    };
    return StreamSupport.stream(spliterator, false)
            .onClose(() -> {
                try {
                    csvReader.close();
                } catch ( final IOException ex ) {
                    throw new UncheckedIOException(ex);
                }
            });
}

The method above can produce two lists of string arrays from your CSV:

CUST,Warren,Q,Darrow,8272 4th Street,New York,IL,76091
TRANS,1165965,2011-01-22 00:13:29,51.43

CUST,Erica,I,Jobs,8875 Farnam Street,Aurora,IL,36314
TRANS,8116369,2011-01-21 20:40:52,-14.83
TRANS,8116369,2011-01-21 15:50:17,-45.45
TRANS,8116369,2011-01-21 16:52:46,-74.6
TRANS,8116369,2011-01-22 13:51:05,48.55
TRANS,8116369,2011-01-21 16:51:59,98.53

Having just these two groups, you can deserialize each group to an instance of Customer:

@AllArgsConstructor
@EqualsAndHashCode
@ToString
final class Customer {

    final String firstName;
    final String middleInitial;
    final String lastName;
    final String address;
    final String city;
    final String state;
    final String zipCode;
    final List<Transaction> transactions;

}

@AllArgsConstructor
@EqualsAndHashCode
@ToString
final class Transaction {

    final String accountNumber;
    final String id;
    final LocalDateTime transactionDate;
    final BigDecimal amount;

}

public final class CsvTest {

    private static final DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");

    @Test
    public void testRead() {
        try ( final Stream<List<String[]>> rawStream = Csv.readGroups(new CSVReader(new InputStreamReader(CsvTest.class.getResourceAsStream("customers.csv"))), CsvTest::isGroupStart, CsvTest::isGroupSpan) ) {
            rawStream
                    .map(CsvTest::parseCustomer)
                    .forEachOrdered(System.out::println);
        }
    }

    private static boolean isGroupStart(final String[] row) {
        return row.length > 0 && row[0].equals("CUST");
    }

    private static boolean isGroupSpan(final String[] row) {
        return row.length > 0 && row[0].equals("TRANS");
    }

    private static Customer parseCustomer(final List<String[]> group) {
        final List<Transaction> transactions = group.subList(1, group.size())
                .stream()
                .map(rawTransaction -> {
                    final String accountNumber = rawTransaction[1];
                    final LocalDateTime transactionDate = LocalDateTime.parse(rawTransaction[2], dateTimeFormatter);
                    final BigDecimal amount = new BigDecimal(rawTransaction[3]);
                    return new Transaction(accountNumber, transactionDate, amount);
                })
                .collect(Collectors.collectingAndThen(Collectors.toList(), Collections::unmodifiableList));
        final String[] rawCustomer = group.get(0);
        final String firstName = rawCustomer[1];
        final String middleInitial = rawCustomer[2];
        final String lastName = rawCustomer[3];
        final String address = rawCustomer[4];
        final String city = rawCustomer[5];
        final String state = rawCustomer[6];
        final String zipCode = rawCustomer[7];
        return new Customer(firstName, middleInitial, lastName, address, city, state, zipCode, transactions);
    }

}

that produces the following output to the terminal:

Customer(firstName=Warren, middleInitial=Q, lastName=Darrow, address=8272 4th Street, city=New York, state=IL, zipCode=76091, transactions=[Transaction(accountNumber=1165965, transactionDate=2011-01-22T00:13:29, amount=51.43)])
Customer(firstName=Erica, middleInitial=I, lastName=Jobs, address=8875 Farnam Street, city=Aurora, state=IL, zipCode=36314, transactions=[Transaction(accountNumber=8116369, transactionDate=2011-01-21T20:40:52, amount=-14.83), Transaction(accountNumber=8116369, transactionDate=2011-01-21T15:50:17, amount=-45.45), Transaction(accountNumber=8116369, transactionDate=2011-01-21T16:52:46, amount=-74.6), Transaction(accountNumber=8116369, transactionDate=2011-01-22T13:51:05, amount=48.55), Transaction(accountNumber=8116369, transactionDate=2011-01-21T16:51:59, amount=98.53)])

I guess it should work even a bit faster than the built-in deserialization in OpenCSV (+ it's just more flexible, however boring). But I'm not yet sure how to improve the code above to support CSV headers instead of the hard-coded column positions.

How can OpenCSV be used to parse multiline records?

1 Answers1