0

I have an issue to parse a CSV file with the following separator ';' and quote char '"'.

The problem is that I have to deal with quotes and semicolons inside the quoted string.

I cannot change the source file / separators etc.

I am using the library com.opencsv version 5.6 to parse the CSV.

CSV file:

"Col1";"Col2";"Co"l3""";"Col; Col"

Expected parsing result:

  1. Position 1: Col1
  2. Position 2: Col2
  3. Position 3: Co"l3"
  4. Position 4: Col; Col

This is the parser:

CSVParser parser = new CSVParserBuilder()
            .withSeparator(';')
            .withQuoteChar('"')
            .withIgnoreQuotations(true)
            .build();

This is result:

  1. (ok) Position 1: Col1
  2. (ok) Position 2: Col2
  3. (ok) Position 3: Co"l3"
  4. (error) Position 4: Col
  5. (error) Position 5: Col

The parser is treating well the quotes inside the quoted string, but is wrongly splitting the last element.

P.S. : If I don't use "with Ignore Quotations" an exception is thrown.

Could you please help me on solve this issue?

Thanks

AndrewP
  • 1
  • 1
  • @XtremeBaumer , what I have in Position 3 is the actual result using that code. So the input is: **Co"l3""** and output is **Co"l3"**. The parser escapes the double double quotes. – AndrewP May 10 '22 at 15:05
  • 1
    I have checked with the code you posted. Once I removed `.withIgnoreQuotations(true)`, the result matches what you expect – XtremeBaumer May 10 '22 at 15:06
  • @XtremeBaumer, thanks for your reply. If I remove `.withIgnoreQuotations(true)` I have the following exception: `CsvMalformedLineException: Unterminated quoted field at end of CSV line. Beginning of lost text: [ Col` – AndrewP May 10 '22 at 15:23
  • @AndrewP - Can you edit your question to show the CSV data in between backticks - and also without any Markdown formatting such as bold font? This is just to ensure we are seeing the source data accurately. – andrewJames May 10 '22 at 18:56
  • Hi @andrewJames, question edited, thanks. – AndrewP May 10 '22 at 19:47
  • Thank you. In that case, I think you have 2 incompatibe flavors of CSV data. The final field requires you to remove `.withIgnoreQuotations(true)` from your parser; but the other fields require it. If you were to change the data to `"Col1";"Col2";"Co"l3"";"Col; Col"` (i.e. only 2 double-quotes instead of 3 in a row for col 3), the the parser which does _not_ ignore quotations would work. – andrewJames May 10 '22 at 20:01
  • Thanks @andrewJames, the problem is present also if we have only one double-quote ( i.e. `"Col1";"Col2";"Co"l3";"Col; Col"` ). It's strange that a CSV library is unable to manage quoted strings with inside quotes and semicolons :) – AndrewP May 10 '22 at 20:19
  • The library _can_ manage what you mention. Just not that specific combination - and rightfully so, as it's effectively invalid CSV. If you need `...;"Col; Col";...` to be parsed as one field containing `Col; Col`, then you need to use `.withSeparator(';').withQuoteChar('"')`. But if you need those, then you cannot also use `.withIgnoreQuotations(true)` and also expect to handle those three double-quotes-in-a-row. – andrewJames May 10 '22 at 21:03
  • You would have to ask the creators of the file to adjust how it is created - but I understand that is not always practical/realistic. – andrewJames May 10 '22 at 21:03
  • @andrewJames, the only way I see right now is to remove ignore quotation: `CSVParser parser = new CSVParserBuilder().withSeparator(';').withQuoteChar('"').build();` And escape the double-quote in the source file. I think the CSV library is unable to manage, in quoted string, semicolon (in this case also the separators) and an odd number of quotes at the same time. – AndrewP May 11 '22 at 07:48
  • This question covers a very similar scenario to yours: [Escaping quotes and delimiters in CSV files with Excel](https://stackoverflow.com/q/43273976/12567365). Yes, you could escape the double-quotes inside the surrounding quotes. Or (my preference) you could fix the CSV data by doubling-up the double-quotes inside the surrounding quotes. Ask whoever created your CSV file how _they_ would parse it back to the original fields... – andrewJames May 11 '22 at 12:46

0 Answers0