I am trying to parse a CSV file using Lumenworks CsvReader. Each data point is wrapped in double quotes, however, some values contain unescaped double quotes within the data, and other values contain commas within the data. The issue I am facing is that when I parse this with CsvReader, extra columns are ending up in my file due to Lumenworks seeing these characters as delimiters.
As you will read below, I've handled the issue with unescaped double quotes using a known solution, but this then results in issue of extra columns being generated for the data with commas inside.
Example: 2 columns (each wrapped in quotes), with unescaped double quotes in one of the data points
"Name","Description"
"Bob","I am a "cool" guy"
When attempting to perform csvReader.ReadNextRecord(), instead of splitting this up into 2 columns, it splits it up into 4 columns:
- Bob
- I am a
- cool
- guy
I've used the solution provided in Reading csv having double quotes with lumenwork csv reader and it works quite well!
This is how I've implemented it:
Char quotingCharacter = '\0' ;
Char escapeCharacter = quotingCharacter;
Char delimiter = ',';
using (CsvReader csvReader = new CsvReader(reader, false, delimiter, quotingCharacter, escapeCharacter, quotingCharacter, ValueTrimmingOptions.All))
{....
csvReader.ReadNextRecord();
...}
HOWEVER, when I implement this fix for my CSV file, it then creates the same issue with columns that have commas inside:
Example: 2 columns (each wrapped in quotes), with commas in one of the data points, after implementing the double quote workaround
"Name","Description"
"Bob","I am related to Suzie, Betty, and Tommy"
With the aforementioned solution implemented, the csvReader now does not know to read the commas as part of the data. Instead of 2 columns, I am left with 4 columns:
- Bob
- I am related to Suzie
- Betty
- and Tommy
So the question is: how do I allow Lumenworks CsvReader to work around this bad data and have it interpret unescaped double quotes as the data itself? How can this be done in a way that doesn't then cause the commas within the data to be interpreted as the delimitation?