I am trying to find a regular expression that will not match a delimiter if it is wrapped in double quotes. But it must also be able to handle values that have a single double quote. I have the first part down with the below expression where DELIMITER
could be just about anything but is mainly commas, pipes, and double pipes:
DELIMITER(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)
This handles a properly formed CSV rowlike apple, "banana, and orange", grape
. I can split on the delimiter and get the values:
['apple', 'banana, and orange', 'grape']
My problem is that I may encounter a line like apple, "banana, and orange, grape
. In this case I would want to get the values:
['apple', '"banana', 'and orange', 'grape']
However, I get:
['apple, "banana', 'and orange', 'grape']
It basically ignores all of the commas up to the double quote.
The logic that I have in my head is that I want to ignore a comma if it is preceded by a double quote, but only if it has a double quote in front of it as well. My first thought was to play around with a look-behind, but I can't get that to work due to look-behinds not able to handle quantifiers (correct me if this is wrong).
I am using Qt QRegExp which I understand is more or less similar to the Perl regex engine. Please let me know if there is more information that I can provide. I know regular expressions can be finicky based on your setup, and I hope I have explained what I'm looking for well enough!