2

I have a scenario where one of the rows in the data has the delimiting character in the content.

5 |0St"|"ring |field[1]

Should always pass - quoted field separator

where the delimiting character is | and it is also present in one of the columns as shown above.

My configuration is as below:

quoteChar = "
quoteEscapeChar = \\

But when I try to parse the row, it is splitting the column into two separate columns ("0St" and "ring") and failing.

If the put quote around the entire columns as shown below, it works fine.

5 |"0St|ring" |field[1]

Should always pass - quoted field separator

Is there any setting to specify delimiter escaping character?

I'm using univocity 2.5.9

Any help is appreciated

Markus
  • 2,071
  • 4
  • 22
  • 44
Sunil Kumar B M
  • 2,735
  • 1
  • 24
  • 31

1 Answers1

1

Author of the library here. I believe I already explained the problem in the ticket you opened, but let me try again:

Basically that's NOT how the CSV format works.

If you have a field delimiter in your value (i.e. you have a | between 0St and ring), your entire value MUST be quoted, i.e. you MUST have your value written as "0St|ring" instead of 0St"|"ring.

Any CSV parser will read 0St"|"ring into 0St" then try to process what's after the | as another value. There is just nothing else you can do other than writing the entire value within quotes.

The ONLY way to get 0St"|"ring to be parsed into a single value (I assume you want to get 0St|ring as a result) is to write your own parsing code to process your data this way.

Hope this helps.

Jeronimo Backes
  • 6,141
  • 2
  • 25
  • 29
  • Thanks @Jeronimo. I had posted the stackoverflow question and github issue together. But I have understood the use case and we have invalidated 0St"|"ring use case – Sunil Kumar B M Jun 21 '18 at 06:55