CSV files are lists, right? Well, some people like lists within lists.
From the documentation
It seems that OpenCSV can only handle sublists within a single "physical" CSV record, and there seem nothing that can handle your case. However, if you can parse the input CSV document record by record, you can organize the parsing into groups parsing so that once a group is ready, you can deserialize it yourself.
For example,
public static Stream<List<String[]>> readGroups(@WillClose final CSVReader csvReader, final Predicate<? super String[]> isGroupStart,
final Predicate<? super String[]> isGroupSpan) {
final Spliterator<List<String[]>> spliterator = new Spliterators.AbstractSpliterator<List<String[]>>(Long.MAX_VALUE, Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
@Override
public boolean tryAdvance(final Consumer<? super List<String[]>> action) {
try {
final String[] head = csvReader.readNextSilently();
if ( !isGroupStart.test(head) ) {
throw new IOException("First record must delimit a group start");
}
final List<String[]> buffer = new ArrayList<>();
buffer.add(head);
@Nullable
String[] peeked;
while ( (peeked = csvReader.peek()) != null && !isGroupStart.test(peeked) ) {
if ( !isGroupSpan.test(peeked) ) {
throw new IOException("Not a group span");
}
csvReader.readNextSilently(); // discard the "peeked" state
buffer.add(peeked);
}
action.accept(buffer);
return peeked != null;
} catch ( final IOException ex ) {
throw new UncheckedIOException(ex);
}
}
};
return StreamSupport.stream(spliterator, false)
.onClose(() -> {
try {
csvReader.close();
} catch ( final IOException ex ) {
throw new UncheckedIOException(ex);
}
});
}
The method above can produce two lists of string arrays from your CSV:
CUST,Warren,Q,Darrow,8272 4th Street,New York,IL,76091
TRANS,1165965,2011-01-22 00:13:29,51.43
CUST,Erica,I,Jobs,8875 Farnam Street,Aurora,IL,36314
TRANS,8116369,2011-01-21 20:40:52,-14.83
TRANS,8116369,2011-01-21 15:50:17,-45.45
TRANS,8116369,2011-01-21 16:52:46,-74.6
TRANS,8116369,2011-01-22 13:51:05,48.55
TRANS,8116369,2011-01-21 16:51:59,98.53
Having just these two groups, you can deserialize each group to an instance of Customer
:
@AllArgsConstructor
@EqualsAndHashCode
@ToString
final class Customer {
final String firstName;
final String middleInitial;
final String lastName;
final String address;
final String city;
final String state;
final String zipCode;
final List<Transaction> transactions;
}
@AllArgsConstructor
@EqualsAndHashCode
@ToString
final class Transaction {
final String accountNumber;
final String id;
final LocalDateTime transactionDate;
final BigDecimal amount;
}
public final class CsvTest {
private static final DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
@Test
public void testRead() {
try ( final Stream<List<String[]>> rawStream = Csv.readGroups(new CSVReader(new InputStreamReader(CsvTest.class.getResourceAsStream("customers.csv"))), CsvTest::isGroupStart, CsvTest::isGroupSpan) ) {
rawStream
.map(CsvTest::parseCustomer)
.forEachOrdered(System.out::println);
}
}
private static boolean isGroupStart(final String[] row) {
return row.length > 0 && row[0].equals("CUST");
}
private static boolean isGroupSpan(final String[] row) {
return row.length > 0 && row[0].equals("TRANS");
}
private static Customer parseCustomer(final List<String[]> group) {
final List<Transaction> transactions = group.subList(1, group.size())
.stream()
.map(rawTransaction -> {
final String accountNumber = rawTransaction[1];
final LocalDateTime transactionDate = LocalDateTime.parse(rawTransaction[2], dateTimeFormatter);
final BigDecimal amount = new BigDecimal(rawTransaction[3]);
return new Transaction(accountNumber, transactionDate, amount);
})
.collect(Collectors.collectingAndThen(Collectors.toList(), Collections::unmodifiableList));
final String[] rawCustomer = group.get(0);
final String firstName = rawCustomer[1];
final String middleInitial = rawCustomer[2];
final String lastName = rawCustomer[3];
final String address = rawCustomer[4];
final String city = rawCustomer[5];
final String state = rawCustomer[6];
final String zipCode = rawCustomer[7];
return new Customer(firstName, middleInitial, lastName, address, city, state, zipCode, transactions);
}
}
that produces the following output to the terminal:
Customer(firstName=Warren, middleInitial=Q, lastName=Darrow, address=8272 4th Street, city=New York, state=IL, zipCode=76091, transactions=[Transaction(accountNumber=1165965, transactionDate=2011-01-22T00:13:29, amount=51.43)])
Customer(firstName=Erica, middleInitial=I, lastName=Jobs, address=8875 Farnam Street, city=Aurora, state=IL, zipCode=36314, transactions=[Transaction(accountNumber=8116369, transactionDate=2011-01-21T20:40:52, amount=-14.83), Transaction(accountNumber=8116369, transactionDate=2011-01-21T15:50:17, amount=-45.45), Transaction(accountNumber=8116369, transactionDate=2011-01-21T16:52:46, amount=-74.6), Transaction(accountNumber=8116369, transactionDate=2011-01-22T13:51:05, amount=48.55), Transaction(accountNumber=8116369, transactionDate=2011-01-21T16:51:59, amount=98.53)])
I guess it should work even a bit faster than the built-in deserialization in OpenCSV (+ it's just more flexible, however boring). But I'm not yet sure how to improve the code above to support CSV headers instead of the hard-coded column positions.