-2

I have the following dependency added:

    <dependency>
    <groupId>net.sf.supercsv</groupId>
    <artifactId>super-csv</artifactId>
    <version>2.4.0</version>
    </dependency>

    private final static String[] COLS = { "col1", "col2", "col3", "col4", "col5",
        "col6", "col7", "col8", "col9", "col10", "col11",
        "col12", "col13", "col14" };


    private final static String[] TEMP_COLS = {"col1", "col2", "col3", "col4", "col5",
        "col6", "col7", "col8", "col9", "col10", "col11",
        "col12", "col13"};

The below is how I build my reader.

protected CsvPreference csvPref = CsvPreference.STANDARD_PREFERENCE;
 protected String encoding = "US-ASCII";
InputStream is = fs.open(path);
      BufferedReader br = new BufferedReader(new InputStreamReader(is, encoding));
      ICsvBeanReader csvReader = new CsvBeanReader(br, csvPref);

As part of bean reader, I have the following code:

Selections bean = null;

    try{
        bean = reader.read(Selections.class, Selections.getCols());
        }catch(Exception e){    
   // bean = reader.read(Selections.class, Selections.getTempCols());
   // slf4j.error(bean.getEventCode() + bean.getProgramId());
    slf4j.error("Error Logged for bean because of COLUMNS MISMATCH");
        }

In the above code, It is throwing exception :

java.lang.IllegalArgumentException:the nameMapping array and the number of columns read should be the same size (nameMapping length = 14, columns = 13))

I am not sure what is causing this exception.It is throwing this exception on some of the records even if all the records have 14 columns(I have verified this by using a script, I have even created a schema and uploaded the file with 14 columns). Out of 7,000,000 records 2,100,000 has this issue.

In order to debug what record is causing this problem I have made the below changes to the code.

Selections bean = null;

        try{
            bean = reader.read(Selections.class, Selections.getCols());
            }catch(Exception e){    
        bean = reader.read(Selections.class, Selections.getTempCols());
        slf4j.error(bean.getEventCode() + bean.getProgramId());
        slf4j.error("Error Logged for bean because of COLUMNS MISMATCH");
            }

Now, the above changes are throwing : java.lang.IllegalArgumentException: the nameMapping array and the number of columns read should be the same size (nameMapping length = 13, columns = 14)

I have no idea why the open csv reader is behaving so strangely. When the count of columns is not 14 it would cause exception and in exception when trying to read it to print the details, It says the column count is 14.

Please help me debug this issue. I shall update more details about the issue if needed. Please let me know.

Scott Conway
  • 975
  • 7
  • 13
Atom
  • 768
  • 1
  • 15
  • 35
  • I do not have any headers on the file. – Atom Dec 22 '15 at 08:51
  • Can you show how you create that reader object of yours? – Jan Dec 22 '15 at 08:56
  • The exception message is clear enough. – Raedwald Dec 22 '15 at 08:59
  • @Raedwald K I have updated the issue.. it is reading certain rows but throwing error with some even if I have verified they have 14 columns. please read my description as well. – Atom Dec 22 '15 at 18:53
  • @ minusvoter - Do you care to comment pls? – Atom Dec 22 '15 at 18:59
  • @Jan I have added the details. – Atom Dec 22 '15 at 19:02
  • Seems like the readet finds some rows with 13 - then 14 is invalid, and some with 14 making 13 invalid. Is there anything in the lines that would sometines be considered another separator or a char escaping a separator? – Jan Dec 22 '15 at 20:54
  • What about a https://stackoverflow.com/help/mcve instead of dumping a mix of maven and duplicated Java code in here? – Robert Dec 24 '15 at 22:21
  • @Jan if it is throwing exception with 14 then it should be 13 or something else but should not be 14. Please make note that I am doing this in catch block. – Atom Dec 30 '15 at 06:35
  • Some are 14, some are 13 and supercsv will break both times. – Jan Dec 30 '15 at 06:51
  • @Jan I didn't understand why it will break both times.. From the snippet above, only when it couldn't handle 14, it should go to the catch block and run it on 13 right. – Atom Dec 30 '15 at 06:58
  • When you catch that exception the line with 13 is already gone - reader will be at **next** line - which again has 14. Your're really best of with switching cvs reader as I recomjended – Jan Dec 30 '15 at 07:07

2 Answers2

0

After a dive into super csv source and your confirmation that you can upload with 14 columns coreectly, I'd suggest you look for a replacement for Super CSV.

My recommendation: Check out Apache Commons CSV.

This library also supports an iterative approach, so you wouldn't need to have 7.000.000 records in memory.

Jan
  • 13,738
  • 3
  • 30
  • 55
0

Finally I resolved the problem, the problem is because of the columnquote mode character that I have given in my CSV preferences.

new CsvPreference.Builder('"', '\u0001', "\r\n").build()

My incoming data has " as part of the data. The issue got resolved when I have replaced quoted column with a character that will never be part of the incoming data.

I am not an expert at it, it is because of my ignorance and super-scv is not at fault. I believe super-csv is a decent API to explore and use.

To know more about column quote mode, please refer to their API. https://super-csv.github.io/super-csv/apidocs/org/supercsv/quote/ColumnQuoteMode.html

Atom
  • 768
  • 1
  • 15
  • 35