-1

I'm coding an Edifact Reader. An Edifact file consists of string lines like this:

string row = @"ABC+1+E522017332:101111757+MAX:MUSTERMANN:16890224+9'";

There is a set of rules that describe a valid line like this. The RegEX translation of this rules in this particular case looks like this:

Regex regex = new Regex(@"ABC\+\d{1}([A-Z0-9])?(\:\d{1})?\+[A-Z0-9]{1,12}\:[A-Z0-9]{9}\+[A-Z0-9]{0,45}\:[A-Z0-9]{0,45}\:\d{8}\+\d{1}(\d{4})?(\d{1})?([A-Z0-9]{1,7})?([A-Z0-9]{3})?([A-Z0-9]{15})?\'");

And it works just fine. But I also want to split this string respectively the non-constants in the RegEx. The result should look like this:

ABC 
1
null
null
E522017332
101111757
MAX
MUSTERMANN
16890224
9
null
null
null
null
null

How can I do it?

Andrej Z.
  • 45
  • 6
  • 3
    Your regex shows exactly why regexes are evil. It is completely impossible to discern what was the intent of the person that wrote it. What are the `null`? – xanatos Jun 04 '18 at 12:22
  • [Capture the parts into capturing groups.](http://regexstorm.net/tester?p=%28ABC%29%5c%2b%28%5cd%29%28%5bA-Z0-9%5d%29%3f%28%3a%5cd%29%3f%5c%2b%28%5bA-Z0-9%5d%7b1%2c12%7d%29%3a%28%5bA-Z0-9%5d%7b9%7d%29%5c%2b%28%5bA-Z0-9%5d%7b0%2c45%7d%29%3a%28%5bA-Z0-9%5d%7b0%2c45%7d%29%3a%28%5cd%7b8%7d%29%5c%2b%28%5cd%29%28%5cd%7b4%7d%29%3f%28%5cd%29%3f%28%5bA-Z0-9%5d%7b1%2c7%7d%29%3f%28%5bA-Z0-9%5d%7b3%7d%29%3f%28%5bA-Z0-9%5d%7b15%7d%29%3f%27&i=ABC%2b1%2bE522017332%3a101111757%2bMAX%3aMUSTERMANN%3a16890224%2b9%27) – Wiktor Stribiżew Jun 04 '18 at 12:24
  • for example the first ([A-Z0-9])? => null – Andrej Z. Jun 04 '18 at 12:24

1 Answers1

2

You only have the use the capture groups (...) for all the pieces you need:

Regex regex = new Regex(@"^(ABC)\+(\d{1})([A-Z0-9])?(\:\d{1})?\+([A-Z0-9]{1,12})\:([A-Z0-9]{9})\+([A-Z0-9]{0,45})\:([A-Z0-9]{0,45})\:(\d{8})\+(\d{1})(\d{4})?(\d{1})?([A-Z0-9]{1,7})?([A-Z0-9]{3})?([A-Z0-9]{15})?\'$");

string row = @"ABC+1+E522017332:101111757+MAX:MUSTERMANN:16890224+9'";

var match = regex.Match(row);

if (match.Success)
{
    for (int i = 1; i < match.Groups.Count; i++)
    {
        string value = match.Groups[i].Value;
        if (value == string.Empty)
        {
            value = "(null)";
        }
        Console.WriteLine(value);
    }
}

Now the Groups are numbered 1...many. But it is quite unreadable... You could give explicit names:

Regex regex = new Regex(@"^(?<abc>ABC)\+(?<digit0>\d{1})(?<lettersdigits0>[A-Z0-9])?(\:\d{1})?\+([A-Z0-9]{1,12})\:([A-Z0-9]{9})\+([A-Z0-9]{0,45})\:([A-Z0-9]{0,45})\:(\d{8})\+(\d{1})(\d{4})?(\d{1})?([A-Z0-9]{1,7})?([A-Z0-9]{3})?([A-Z0-9]{15})?\'$");

string row = @"ABC+1+E522017332:101111757+MAX:MUSTERMANN:16890224+9'";

var match = regex.Match(row);

if (match.Success)
{
    {
        string value = match.Groups["abc"].Value;
        if (value == string.Empty)
        {
            value = "(null)";
        }
        Console.WriteLine(value);
    }

    {
        string value = match.Groups["digit0"].Value;
        if (value == string.Empty)
        {
            value = "(null)";
        }
        Console.WriteLine(value);
    }
}

Possibly names better than abc, digit0, letterdigit0 :-) Names that explain what the digit/letter is!

xanatos
  • 109,618
  • 12
  • 197
  • 280