7

I have been trying to read a csv and add fields to a Data Structure. But, one of the row is not formed properly, and I am aware of that. I just want to skip the row and move on to another. But, even though I am catching the exception, It's still breaking the loop. Any idea what I am missing here?

My csv:

"id","name","email"
121212,"Steve","steve@example.com"
121212,"Steve","steve2@example.com",,
121212,"Steve","steve@example.com"

My code:

import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;

public static void main(String[] args) throws Exception{
    Path path = Paths.get("list2.csv");
    CsvMapper mapper = new CsvMapper();
    CsvSchema schema = CsvSchema.emptySchema().withHeader();
    MappingIterator<Object> it = mapper.reader(Object.class)
            .with(schema)
            .readValues(path.toFile());

    try{
        while(it.hasNext()){
            Object row;
            try{
                row = it.nextValue();
            } catch (IOException e){
                e.printStackTrace();
                continue;
            }
        }
    } catch (ArrayIndexOutOfBoundsException e){
        e.printStackTrace();
    }

}

Exception:

com.fasterxml.jackson.core.JsonParseException: Too many entries: expected at most 3 (value #3 (0 chars) "")
 at [Source: java.io.InputStreamReader@12b3519c; line: 3, column: 38]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1486)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
    at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntryExpectEOL(CsvParser.java:601)
    at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:587)
    at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:474)
    at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:592)
    at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:440)
    at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:188)
    at CSVTest.main(CSVTest.java:24)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
java.lang.ArrayIndexOutOfBoundsException: 3
    at com.fasterxml.jackson.dataformat.csv.CsvSchema.column(CsvSchema.java:941)
    at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNamedValue(CsvParser.java:614)
    at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:476)
    at com.fasterxml.jackson.databind.MappingIterator.hasNextValue(MappingIterator.java:158)
    at CSVTest.main(CSVTest.java:21)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
StaxMan
  • 113,358
  • 34
  • 211
  • 239
notacyborg
  • 115
  • 1
  • 11
  • What exception is thrown? Edit your question to include the stack trace. – dsh Sep 24 '15 at 12:42
  • for what it's worth, during my research I found some issues opened that may be related to the issue I am running into. https://github.com/FasterXML/jackson-dataformat-csv/issues/52 – notacyborg Sep 24 '15 at 16:17

4 Answers4

2

Your CSV is not necessarily malformed, in fact it's very common to have rows with varying number of columns.

univocity-parsers handles this without any trouble.

The easiest way would be:

BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);

CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);

CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader(Paths.get("list2.csv").toFile());

// The BeanListProcessor provides a list of objects extracted from the input.
List<TestBean> beans = rowProcessor.getBeans();

If you want to discard the elements built using a row with inconsistent number of column, override the beanProcessed method and use the ParsingContext object to analyse your data and decide whether to keep or drop the row.

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

josliber
  • 43,891
  • 12
  • 98
  • 133
Jeronimo Backes
  • 6,141
  • 2
  • 25
  • 29
2

With Jackson 2.6 handling of readValues() has been improved to try to recover from processing errors, such that in many cases you can just try again, to read following valid rows. So make sure to use at least version 2.6.2.

Earlier versions did not recover as well, usually rendering rest of the content unprocessable; this may be what happened in your case.

Another possibility, given that your problem is not with invalid CSV, but rather one not mappable as POJOs (at least the way as POJO is defined), is to read content as a sequence of String[], and handling mapping manually. Jackson's CSV parser itself does not mind any number of columns, it is the higher level databinding that does like finding "extra" content that it does not recognize.

StaxMan
  • 113,358
  • 34
  • 211
  • 239
  • I tried updating Jackson to 2.6.2 but it really didn't solve my problem. Using String[] did work however. Thanks. – notacyborg Sep 25 '15 at 21:47
  • @notacyborg Ok, I'll probably file an issue, if I understand your problem correctly. Definitely sounds like a problem that ought to be recovered from quite easily. – StaxMan Sep 25 '15 at 22:16
  • Thank you, @StaxMan. Please link me to the issue when/if you end up filing it. I'd love to participate in discussions. It will save me a lot of headache when it's fixed. :) – notacyborg Sep 26 '15 at 14:51
  • @notacyborg Sure, it's here: https://github.com/FasterXML/jackson-dataformat-csv/issues/91 -- feel free to add ideas, or other edge cases. Exception handling can get bit tricky since it has to work from low-level handling up to databinding. – StaxMan Sep 28 '15 at 20:02
  • Fix will be in 2.6.3, and behavior should work so that initial exception may be caught (as it signals problem of extra columns) for the row in question, but processing can continue for the remaining rows. Earlier versions did lead to inconsistent state in which iteration could not be continued. – StaxMan Oct 07 '15 at 05:07
1

com.fasterxml.jackson.core.JsonParseException is an IOException so that exception should be caught in the try-catch block. The fact that it is not being caught leads me to believe that it's happening in the hasNext() method. That's a common pattern: in order to know whether there is another you actually have to try to read the next one.

Chris Gerken
  • 16,221
  • 6
  • 44
  • 59
0

I can't tell for certain since some of the stack trace was omitted, however:

  • If ArrayIndexOutOfBoundsException is the exception that is thrown (as opposed to being a "cause") then the reason is that you catch it outside of your loop.
  • If the exception is a (subclass of) IOException, then as Chris Gerken wrote it may be thrown in it.hasNext(), in which case you don't catch it at all and so your program will exit.

The remainder of the stack trace would indicate which of these, or some other reason altogether, is the problem.



Update based on complete output and stack traces:

On line 24 of CSVTest.java, you call .nextValue(). In the implementation of calling this method, a JsonParseException is thrown. Since that is a subclass of IOException, your catch block catches it, prints the stack trace and continues with your loop. So far so good.

com.fasterxml.jackson.core.JsonParseException: Too many entries: expected at most 3 (value #3 (0 chars) "")
 at [Source: java.io.InputStreamReader@12b3519c; line: 3, column: 38]
   at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1486)
   at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
   at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntryExpectEOL(CsvParser.java:601)
   at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:587)
   at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:474)
   at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:592)
   at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:440)
   at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:188)
   at CSVTest.main(CSVTest.java:24)

After that, on line 21 of CSVTest.java, you call .hasNextValue(). In the implementation of this method, an ArrayIndexOutOfBoundsException is thrown. You catch it, and also print the stack trace. However your catch block is outside of your loop, and so by the time you catch the exception the loop has already been exited.

java.lang.ArrayIndexOutOfBoundsException: 3
    at com.fasterxml.jackson.dataformat.csv.CsvSchema.column(CsvSchema.java:941)
    at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNamedValue(CsvParser.java:614)
    at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:476)
    at com.fasterxml.jackson.databind.MappingIterator.hasNextValue(MappingIterator.java:158)
    at CSVTest.main(CSVTest.java:21)

If you really want to continue your loop here, then you will need to move that try-catch construct inside the loop. Perhaps like this:

while (true)
    {
    try
        {
        if (!it.hasNextValue())
            { break; }
        }
    catch (final ArrayIndexOutOfBoundsException err)
        {
        err.printStackTrace();
        continue;
        }

    Object row;
    try
        { row = it.nextValue(); }
    catch (final IOException err)
        {
        err.printStackTrace();
        continue;
        }
    }

However, this code is an infinite loop. When hasNextValue() throws an ArrayIndexOutOfBoundsException, the state has not changed the loop will never end. I show this to show the principle of moving the catch block inside the loop, not as a workable resolution.

You added a comment to the question referencing discussion of error handling in jackson-dataformat-csv. It appears that you encountered a limitation (or bug) in the library when it comes to skipping malformed rows.

dsh
  • 12,037
  • 3
  • 33
  • 51