6

I have the following regex:

@"{thing:(?:((\w)\2*)([^}]*?))+}"

I'm using it to find matches within a string:

MatchCollection matches = regex.Matches(string);
       IEnumerable formatTokens = matches[0].Groups[3].Captures
                                   .OfType<Capture>()
                                   .Where(i => i.Length > 0)
                                   .Select(i => i.Value)
                                   .Concat(matches[0].Groups[1].Captures.OfType<Capture>().Select(i => i.Value));

This used to yield the results I wanted; however, my goal has since changed. This is the desired behavior now:

Suppose the string entered is 'stuff/{thing:aa/bb/cccc}{thing:cccc}'

I want formatTokens to be:

formatTokens[0] == "aa/bb/cccc"
formatTokens[1] == "cccc"

Right now, this is what I get:

formatTokens[0] == "/"
formatTokens[1] == "/"
formatTokens[2] == "cccc"
formatTokens[3] == "bb"
formatTokens[4] == "aa"

Note especially that "cccc" does not appear twice even though it was entered twice.

I think the problems are 1) the recapture in the regex and 2) the concat configuration (which is from when I wanted everything separated), but so far I haven't been able to find a combination that yields what I want. Can someone shed some light on the proper regex/concat combination to yield the desired results above?

  • 1
    It sounds as if you just want `Regex.Matches(s, @"{thing:([^}]*)}").Cast().Select(x => x.Groups[1].Value).ToList()` ([regex demo](http://regexstorm.net/tester?p=%7bthing%3a%28%5b%5e%7d%5d*%29%7d&i=stuff%2f%7bthing%3aaa%2fbb%2fcccc%7d%7bthing%3acccc%7d)) – Wiktor Stribiżew Jun 19 '18 at 23:42
  • @WiktorStribiżew This worked, thanks so much. Do you want to post your comment as an Answer so I can accept it as the answer? Thanks again. – Courtney Thurston Jun 20 '18 at 02:30
  • How about `(?<={thing:).*?(?=})` isn't this a better answer? Should I post it and you can accept it? Or, were you being a little wild with a regex specific for repeating letters ?!?!? –  Jun 20 '18 at 12:31

2 Answers2

2

You may use

Regex.Matches(s, @"{thing:([^}]*)}")
    .Cast<Match>()
    .Select(x => x.Groups[1].Value)
    .ToList()

See the regex demo

Details

  • {thing: - a literal {thing: substring
  • ([^}]*) - Capturing group #1 (when a match is obtained, its value can be accessed via match.Groups[1].Value): 0+ chars other than }
  • } - a } char.

This way, you find multiple matches and only collect Group 1 values in the resulting list/array.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Mod update

I'm not sure why you settled for Stringnuts regex because it matches
anything inside braces {}.

The meek on SO will not get the satisfaction of deep knowledge,
so that may be your real problem.

Lets analyze your regex.

 {thing:
 (?:
      (                             # (1 start)
           ( \w )                        # (2)
           \2* 
      )                             # (1 end)
      ( [^}]*? )                    # (3)
 )+
 }

This reduces to this

 {thing:
 (?: \w [^}]*? )+
 }

The only constraint is that right after {thing: there must be a word.
After which there can be anything else, because this clause [^}]*? accepts
anything.
Also, even though that clause is not greedy, the surrounding cluster will only run one iteration (?: )+

So, basically, it does almost nothing except for the single word requirement.

Your regex can be used as is to get convoluted matches,
and because you've captured all the parts in Capture Collections,
with each match you can piece that together using the code below.

I would try to understand regex a little better, before you go on to other stuff since it is likely much more important than the
language tricks used to extract data.

Here is how you would piece it all together using your unaltered regex.

Regex regex = new Regex(@"{thing:(?:((\w)\2*)([^}]*?))+}");
string str = "stuff/{thing:aa/bb/cccc}{thing:cccc}";
foreach (Match match in regex.Matches(str))
{
    CaptureCollection cc1 = match.Groups[1].Captures;
    CaptureCollection cc3 = match.Groups[3].Captures;
    string token = "";
    for (int i = 0; i < cc1.Count; i++)
        token += cc1[i].Value + cc3[i].Value;
    Console.WriteLine("{0}", token);
}

Output

aa/bb/cccc
cccc

Note that for example, your regex will match almost anything inside
of the braces as long as the first character is a word.

For example, it matches {thing:Z,,,*()(((asgassgasg,asgfasgafg\/\=99.239 }

You may want to think about the requirements of what actually is allowed
inside the braces.

Good Luck!