4

I am trying to parse CSV file using Jackson's CSV data format module.

I tried sample code given on their project homepage (https://github.com/FasterXML/jackson-dataformat-csv)

CsvMapper mapper = new CsvMapper();
mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
File csvFile = new File("input.csv");
MappingIterator<String[]> it =  mapper.reader(String[].class).readValues(csvFile);
while (it.hasNext()) {
    String[] row = it.next();
    System.out.println(row)
}

this small code is giving me error

Exception in thread "main" java.io.CharConversionException: Invalid UTF-8 start byte 0x92 (at char #269, byte #-1)
at com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportInvalidInitial(UTF8Reader.java:393)
at com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.read(UTF8Reader.java:245)
at com.fasterxml.jackson.dataformat.csv.impl.CsvReader.loadMore(CsvReader.java:438)
at com.fasterxml.jackson.dataformat.csv.impl.CsvReader.hasMoreInput(CsvReader.java:475)
at com.fasterxml.jackson.dataformat.csv.CsvParser._handleStartDoc(CsvParser.java:461)
at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:414)
at com.fasterxml.jackson.databind.ObjectReader._bindAndReadValues(ObjectReader.java:1492)
at com.fasterxml.jackson.databind.ObjectReader.readValues(ObjectReader.java:1335)
at com.til.etwealth.etmoney.util.alok.main(alok.java:18)  

I am able to read same file using openCSV
I tried to find out through this error on internet but could not find useful. please someone tell what I am missing?

Alok
  • 7,734
  • 8
  • 55
  • 100

2 Answers2

4

Most likely you are reading content that is not UTF-8 encoded, but using something else, such as Latin-1 (ISO-8859-1). I think that error message you get is not very good, so maybe it could be improved to suggest likely reason, as this is relatively common problem.

To read non-Unicode encodings, you need to construct Reader yourself (since it is not possible to reliably auto-detect difference -- although there may be Java libs that could use heuristics to try to determine this automatically):

mapper.readValues(new InputStreamReader(new FileInputStream(csvFile), "ISO-8859-1");

alternatively it may be that whatever is used to encode the file should specify UTF-8 encoding to be used.

There are other possible reasons (such as file truncation), but mismatching character encoding is a common reason. The main oddity here is actually that particular character code, which is not a printable character in (most?) ISO-8859-x encodings.

StaxMan
  • 113,358
  • 34
  • 211
  • 239
  • I am sure there is no non printable or special character in my file. and I am able to read my file using `openCVS` – Alok Apr 10 '15 at 10:42
  • If you have a sample file that triggers this, it would be good to file a bug report at (https://github.com/FasterXML/jackson-dataformat-csv/issues/). – StaxMan Apr 10 '15 at 19:58
1

A workaround which will work in most cases is to import Apache Tika and use the AutoDetectReader (see https://tika.apache.org/1.2/api/org/apache/tika/detect/AutoDetectReader.html)

Try this:

   //get a file stream in utf format for this file (since they are often not in utf by 
   Charset charset = new AutoDetectReader(new FileInputStream(file)).getCharset();
   String f = FileUtils.readFileToString(file, charset);
   CsvMapper mapper = new CsvMapper();
   CsvSchema schema = CsvSchema.emptySchema().withHeader();
   MappingIterator<Map<String, String>> it = mapper.reader(Map.class).with(schema).readValues(f.getBytes());

Where I also used apache commons to convert the file to a String. This can be done without apache commons, just google it

Nolf
  • 113
  • 1
  • 10