0

I am getting trouble writing a regex in C# that basically captures everything between 2 double quotes. If that group contains escaped double-quote, they would be captured as well. After reading the regex wiki I still haven't been able to write one that completely does the job.

There is a coma character between the different matches.

The following string:

 "first \"value\\\\", "second, value", "third value"

needs to give the following matches:

  • first \"value\\\\
  • second, value
  • third value

Thanks for your help!

icykof
  • 53
  • 6
  • 1
    This looks like CSV data, there are a million libraries and packages that handle this. Is there a reason you cannot use one of those libraries? – maccettura May 10 '18 at 14:40
  • 2
    It seems that you are working with CSV (Comma Separated Values) format;if it's your case, have a look at `Microsoft.VisualBasic.FileIO.TextFieldParser` – Dmitry Bychenko May 10 '18 at 14:41
  • 1
    Possible duplicate of https://stackoverflow.com/questions/13024073/regex-c-sharp-extract-text-within-double-quotes – Arpit Gupta May 10 '18 at 14:48
  • 2
    See [this demo](https://ideone.com/bw9Vo2) - is that what you need? – Wiktor Stribiżew May 10 '18 at 15:16
  • 1
    The original string is a bit more complicated than that but I have extracted it to the example. Also I am targetting multiple frameworks and thought it might be easier to go through regex. – icykof May 10 '18 at 15:17
  • @WiktorStribiżew Yes it seems to be working just like Arpit's answer. I will analyze what they are doing to get a bit better with regex. Many thanks! – icykof May 10 '18 at 15:27
  • Is this an exercise in understanding regular expressions, or do you have a practical problem to solve? If the latter, just write a lexer. It's not hard. – Eric Lippert May 10 '18 at 15:32
  • Actually, Arpit's solution is not working if the first `"` is an escaped quote. – Wiktor Stribiżew May 10 '18 at 16:14
  • @WiktorStribiżew the first " can never be escaped in my scenario so it should be alright. thanks! – icykof May 10 '18 at 17:31

1 Answers1

2

This regex should solve your purpose -

str = Regex.Replace(str, @"(""[^""\\]*(?:\\.[^""\\]*)*"")|", "$1");
Arpit Gupta
  • 1,209
  • 1
  • 22
  • 39
  • 2
    This is indeed solving my issue. thanks for your help. I will analyze the regex to try to understand what that is doing – icykof May 10 '18 at 15:28