3

Sample Data -

Header1, full_name, header3, header4

  1. 20, "bob, XXX", "test", 30
  2. 20, "evan"s,YYY ", "test", 30
  3. 20, "Tom, ZZZ", "test", 30

    CSVReader csvReader = new CSVReader(reader, ',', '"');
    

The second row doesn't read as expected. since there is a double quote in the full_name column value.

I want to ignore such cases. any suggestion would be appreciated.

using openCSV java api for parsing.

Edit:

I am getting the data from database. one of the database column field has that one double quote in it's value. Because of that the csv data looks malformed.

arun
  • 45
  • 1
  • 8
  • Possible duplicate of [CSV parser in JAVA, double quotes in string (SuperCSV, OpenCSV)](http://stackoverflow.com/questions/23000676/csv-parser-in-java-double-quotes-in-string-supercsv-opencsv) – Etienne Sep 15 '16 at 18:16
  • 3
    The CSV is malformed. See https://tools.ietf.org/html/rfc4180, Rule 7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. – Guenther Sep 15 '16 at 18:54
  • It is not malformed, the Original data has double quote inside it. @Guenther – arun Sep 15 '16 at 22:12
  • Maybe Java Parser with regex can help you – stuck Sep 15 '16 at 22:25
  • I need to rewrite the logic to parse the csv file. I intended to use any csv reader frameworks for now. If nothing works out I might write custom parsing for it. Thanks for suggestion @pilkington – arun Sep 16 '16 at 15:48

1 Answers1

2

univocity-parsers can handle unescaped quotes and is also 4x faster than opencsv. Try this code:

public static void main(String... args){
    String input = "" +
            "20, \"bob, XXX\", \"test\", 30\n" +
            "20, \"evan\"s,YYY \", \"test\", 30\n" +
            "20, \"Tom, ZZZ\", \"test\", 30 ";


    CsvParserSettings settings = new CsvParserSettings();

    CsvParser parser = new CsvParser(settings);
    List<String[]> rows = parser.parseAll(new StringReader(input));

    //printing values enclosed in [ ]  to make sure you are getting the expected result
    for(String[] row : rows){
        for(String value : row){
            System.out.print("[" + value + "],");

        }
        System.out.println();
    }
}

This will produce:

[20],[bob, XXX],[test],[30],
[20],["evan"s],[YYY "],[test],[30],
[20],[Tom, ZZZ],[test],[30],

Additionally, you can control how to handle unescaped quotes with one of:

settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER);
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.RAISE_ERROR);
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.SKIP_VALUE);

When reading large files, you can use a RowProcessor or iterate over each row like this:

parser.beginParsing(new File("/path/to/your.csv"));

String[] row;
while ((row = parser.parseNext()) != null) {
    // process row
}

Disclaimer: I'm the author of this libary. It's open source and free (Apache 2.0 license)

Jeronimo Backes
  • 6,141
  • 2
  • 25
  • 29
  • your solution is works good for small data. I am dealing with huge thousands of rows and hundred's of columns. doing this might add more time. Thanks for the suggestion. – arun Sep 16 '16 at 15:46
  • There are many ways to read the data. I just posted an example. You can read files with trillions of rows and hundreds of gigabytes with it. Read the tutorial to learn more. – Jeronimo Backes Sep 16 '16 at 15:48
  • I've updated my answer to show how you can use the library to process large files. A 100mb file with 3 million rows takes about 700ms to be fully parsed on my macbook pro. Hope this helps – Jeronimo Backes Sep 17 '16 at 04:49