How to deal with CSV files having an unknown number of columns using Super CSV

Question

For a project I need to deal with CSV files where I do not know the columns before runtime. The CSV files are perfectly valid, I only need to perform a simple task on several different files over and over again. I do need to analyse the values of the columns, which is why I would need to use a library for working with CSV files. For simplicity, lets assume that I need to do something simple like appending a date column to all files, regardless how many columns they have. I want to do that with Super CSV, because I use the library for other tasks as well.

What I am struggeling with is more a conceptual issue. I am not sure how to deal with the files if I do not know in advance how many columns there are. I am not sure how I should define POJOs that map arbitrary CSV files or how I should define the Cell Processors if I do not know which and how many columns will be in the file. How can I dynamically create Cell processors that match the number of columns? How would I define POJOs for instance based on the header of the CSV file?

Consider the case where I have two CSV files: products.csv and address.csv. Lets assume I want to append a date column with today’s date for both files, without having to write two different methods (e.g. addDateColumnToProduct() and addDateColumnToAddress()) which do the same thing.

product.csv:

name, description, price
"Apple", "red apple from Italy","2.5€" 
"Orange", "orange from Spain","3€"

address.csv:

firstname, lastname
"John", "Doe"
"Coole", "Piet"

Based on the header information of the CSV files, how could I define a POJO that maps the product CSV? The same question for Cell Processors? How could I define even a very simple cell processor that just basically has the right amount of parameters for the constructor, e.g. for the product.csv

CellProcessor[] processor = new CellProcessor[] { 
    null,
    null,
    null
};

and for the address.csv:

CellProcessor[] processor = new CellProcessor[] { 
    null,
    null
};

Is this even possible? Am I on the wrong track to achieve this?

Edit 1: I am not looking for a solution that can deal with CSV files having variable columns in one file. I try to figure out if it is possible dealing with arbitrary CSV files during runtime, i.e. can I create POJOs based only on the header information which is contained in the CSV file during runtime. Without knowing in advance how many columns a csv file will have.

Solution Based on the answer and comments from @baba

private static void readWithCsvListReader() throws Exception {

        ICsvListReader listReader = null;
        try {
                listReader = new CsvListReader(new FileReader(fileName), CsvPreference.TAB_PREFERENCE);

                listReader.getHeader(true); // skip the header (can't be used with CsvListReader)
                int amountOfColumns=listReader.length();
                CellProcessor[] processor = new CellProcessor[amountOfColumns];
                List<Object> customerList;

                while( (customerList = listReader.read(processor)) != null ) {
                        System.out.println(String.format("lineNo=%s, rowNo=%s, customerList=%s", listReader.getLineNumber(),
                                listReader.getRowNumber(), customerList));
                }

        }
        finally {
                if( listReader != null ) {
                        listReader.close();
                }
        }
}

score 3 · Accepted Answer · answered Nov 27 '14 at 21:50

3

Maybe a little bit late but could be helpful...

  CellProcessor[] processors=new CellProcessor[properties.size()];

  for(int i=0; i< properties.zise(); i++){
            processors[i]=new Optional();

   }
    return  processors;

answered Nov 27 '14 at 21:50

Ulises Mancilla

46
3

`for(int i=0; i< properties.size(); i++){`.. there is a typo in your answer and seems to do what is required. – Sid May 02 '17 at 14:46

score 1 · Answer 2 · answered Feb 09 '14 at 11:42

1

This is a very common issue and there are multiple tutorials on the internetz, including the Super Csv page:

http://supercsv.sourceforge.net/examples_reading_variable_cols.html

As this line says:

As shown below you can execute the cell processors after calling read() by calling the executeProcessors() method. Because it's done after reading the line of CSV, you have an opportunity to check how many columns there are (using listReader.length()) and supplying the correct number of processors.

answered Feb 09 '14 at 11:42

Nikola Yovchev

9,498
4
46
72

It seems instead of "variable columns" they actually mean _optional columns_. It looks like you have provide a processor for every possible amount of columns for this to work. – kapex Feb 09 '14 at 11:47
you can observe the amount of cols in advance by parsing the header, and then make a smart decision about the amount of processors you'd use. – Nikola Yovchev Feb 09 '14 at 11:54
Thanks @baba, I edited my question to be more precise. I know I can parse the header and count the columns, but how can I create a new CellProcessor Object with a dynamic number of parameters (for instance one "null" parameter for each column) during runtime? – Stefan Feb 09 '14 at 11:59
Additionally, you can use openCSV instead of super csv, and just abandon the whole "object oriented approach gone wrong" thingthat SuperCSV has going on and start treating each line just as a list of Strings. – Nikola Yovchev Feb 09 '14 at 12:01
@MightyApe some parsing logic needs to be used indeed, but you can just parse the number of cols, get its length and then do: CellProcessor[] processor = new CellProcessor[suchLength]; – Nikola Yovchev Feb 09 '14 at 12:03
@baba, I included a small code snipped based on your solution. Sometimes one does not see the obvious. Thanks for your help. – Stefan Feb 09 '14 at 12:11

How to deal with CSV files having an unknown number of columns using Super CSV

2 Answers2