I don't think you can do much to work around it. CSVParser
is a final class and does not let controlling the way it parses the underlying stream. However, it is sort possible to work around it by having a custom iterator that would do the trick.
public final class Csv {
private Csv() {
}
public interface ICsvParserFactory {
@Nonnull
CSVParser createCsvParser(@Nonnull Reader lineReader);
}
public static Stream<CSVRecord> tryParseLinesLeniently(final BufferedReader bufferedReader, final ICsvParserFactory csvParserFactory) {
return bufferedReader.lines()
.map(line -> {
try {
return csvParserFactory.createCsvParser(new StringReader(line))
.iterator()
.next();
} catch ( final IllegalStateException ex ) {
return null;
}
})
.filter(Objects::nonNull)
.onClose(() -> {
try {
bufferedReader.close();
} catch ( final IOException ex ) {
throw new RuntimeException(ex);
}
});
}
}
However, I don't think it's a good idea in any case:
- It cannot return a
CSVParser
instance.
- It might return an
Iterator<CSVRecord>
instead of Stream<CSVRecord>
(and save of the filter
operation) but I just find streams more simple to implement.
- It creates a new CSV parser for each line, therefore this method creates many objects for a CSV document that contains "too many" lines. The string reader can be probably made reusable.
- The whole idea of the method is that it, not being a CSV parser, assumes that each lines holds one line only (I don't really remember if CSV/TSV allow multiline records), so it violates CSV parsing rules just by design. It does not support headers yet (but can be easily improved).
final Csv.ICsvParserFactory csvParserFactory = lineReader -> {
try {
return new CSVParser(lineReader, CSVFormat.EXCEL);
} catch ( final IOException ex ) {
throw new RuntimeException(ex);
}
};
try ( final Stream<CSVRecord> csvRecords = Csv.tryParseLinesLeniently(new BufferedReader(reader), csvParserFactory) ) {
csvRecords.forEachOrdered(System.out::println);
}
If possible, please let your CSV parser consume valid CSV documents not using any workarounds like this one.
Edit 1
There is an implementation flaw in the code above: ALL records returned from the stream now have the recordNumber
set to 1
.
Now I do believe the request cannot be fixed using the Apache Commons CSV parser, since the only CSVRecord
constructor is also package-private and cannot be instantiated outside that package if not using either reflection or intruding to its declaring package.
Sorry you have either fix your CSV documents, or use another parser that can parse "more leniently".