2

Duplicate

Regex for variable declaration and initialization in c#

I was looking for a Regular Expression to parse CSV values, and I came across this Regular Expression

[^,]+

Which does my work by splitting the words on every occurance of a ",". What i want to know is say I have the string

value_name v1,v2,v3,v4,...

Now I want a regular expression to find me the words v1,v2,v3,v4..

I tried ->

^value_name\s+([^,]+)*

But it didn't work for me. Can you tell me what I am doing wrong? I remember working on regular expressions and their statemachine implementation. Doesn't it work in the same way.

If a string starts with Value_name followed by one or more whitespaces. Go to Next State. In That State read a word until a "," comes. Then do it again! And each word will be grouped!

Am i wrong in understanding it?

Community
  • 1
  • 1
Anirudh Goel
  • 4,571
  • 19
  • 79
  • 109
  • Dupe. See http://stackoverflow.com/questions/585853/regex-for-variable-declaration-and-initialization-in-c where I provided you with a hint. Please use the other thread instead of creating a new one. – dirkgently Feb 26 '09 at 06:34

3 Answers3

6

You could use a Regex similar to those proposed:

(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?
  • The first group is non-capturing and would match the start of the line and the value_name.
    To ensure that the Regex is still valid over all matches, we make that group optional by using the '?' modified (meaning match at most once).

  • The second group is capturing and would match your vXX data.

  • The third group is non-capturing and would match the ,, and any whitespace before and after it.
    Again, we make it optional by using the '?' modifier, otherwise the last 'vXX' group would not match unless we ended the string with a final ','.

In you trials, the Regex wouldn't match multiple times: you have to remember that if you want a Regex to match multiple occurrences in a strings, the whole Regex needs to match every single occurrence in the string, so you have to build your Regex not only to match the start of the string 'value_name', but also match every occurrence of 'vXX' in it.

In C#, you could list all matches and groups using code like this:

Regex r = new Regex(@"(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?");
Match m = r.Match(subjectString);
while (m.Success) {
    for (int i = 1; i < m.Groups.Count; i++) {
        Group g = m.Groups[i];
        if (g.Success) {
            // matched text: g.Value
            // match start: g.Index
            // match length: g.Length
        } 
    }
    m = m.NextMatch();
} 
Renaud Bompuis
  • 16,596
  • 4
  • 56
  • 86
3

I would expect it only to get v1 in the group, because the first comma is "blocking" it from grabbing the rest of the fields. How you handle this is going to depend on the methods you use on the regular expression, but it may make sense to make two passes, first grab all the fields seperated by commas and then break things up on spaces. Perhaps ^value_name\s+(?:([^,]+),?)* instead.

Logan Capaldo
  • 39,555
  • 5
  • 63
  • 78
2

Oh yeah, lists....

/(?:^value_name\s+|,\s*)([^,]+)/g will theoreticly grab them, but you will have to use RegExp.exec() in a loop to get the capture, rather than the whole match.

I wish pre-matches worked in JS :(.

Otherwise, go with Logan's idea: /^value_name\s+([^,]+(?:,\s*[^,]+)*)$/ followed by .split(/,\s*/);

Simon Buchan
  • 12,707
  • 2
  • 48
  • 55