0

This is probably very simple, but I have not been able to find an option to do this. I'm trying to Apache Commons CSV to read a file for later validations. The CSV in question is submitted as an Input Stream, which seems to add an additional column to the file when it reads it, containing the line numbers. I would like to be able to ignore it, if possible, as the header row does not contain a number, which causes an error. Is there an option already in InputStream to do this, or will I have to set up some kind of post processing?

The code I'm using is as follows:

public String validateFile(InputStream filePath) throws Exception{
        System.out.println("Sending file to reader");
        System.out.println(filePath);
        InputStreamReader in = new InputStreamReader(filePath);
        //CSVFormat parse needs a reader object
        System.out.println("sending reader to CSV parse");
        for (CSVRecord record : CSVFormat.DEFAULT.withHeader().parse(in)) {
            for (String field : record) {
                System.out.print("\"" + field + "\", ");
            }
            System.out.println();
        }
        return null;
    }

When using withHeader(), I end up with the following error:

java.lang.IllegalArgumentException: A header name is missing in [, Employee_ID, Department, Email]

and I can't simply skip it, as I will need to do some validations on the header row.

Also, here is an example CSV file:

"Employee_ID", "Department", "Email"
"0123456","Department of Hello World","John.Doe@gmail.com"

EDIT: Also, The end goal is to validate the following:

  1. That there are columns called "Employee_ID", "Department", and "Email". For this, I think I'll need to remove .withHeader().
  2. Each line is comma delimited.
  3. There are no empty cells values
SVill
  • 331
  • 5
  • 22
  • 55
  • 1
    *[XY problem](https://meta.stackexchange.com/q/66377/351454):* Why not just call [`withAllowMissingColumnNames()`](https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#withAllowMissingColumnNames--) on the `CSVFormat`? – Andreas Jan 06 '21 at 20:22
  • @Andreas I hadn't seen that part of the Apache Commons CSV documentation. That fixed the error though, and once I replaced the withHeader() with that, it seems to read through the file. I'm assuming that I'll need to specify the columns to ignore the line numbers, though (ex: employee=row[1], dept=row[2], etc.). Is that correct, or am I missing a more efficient solution? – SVill Jan 06 '21 at 21:08
  • 1) If you wanted `withHeader()`, why did you remove it? --- 2) That is `validateFile()` supposed to validate? I assume that it's not supposed to print. That's just debug information, for now, right? So what exactly is the method supposed to validate? I would have assumed that it should validate that file is CSV, and that it contains the required columns. For that, you'd need column headers, and values from those columns, but you don't care about other columns, right? So you don't care about the *unnamed* line number column, right? You really don't even care about column order. – Andreas Jan 07 '21 at 00:44

1 Answers1

4

Newer versions of Commons-CSV have trouble with empty headers. Maybe that's the case here as well? You just mentioned "no empty cell values" not sure if this included headers as well...

Also see: https://issues.apache.org/jira/browse/CSV-257

Setting .setAllowMissingColumnNames(true) did the trick for me.

final CSVFormat csvFormat = CSVFormat.Builder.create()
        .setHeader(HEADERS)
        .setAllowMissingColumnNames(true)
        .build();
final Iterable<CSVRecord> records = csvFormat.parse(reader);
Flo
  • 189
  • 2
  • 7