5

I am trying to split using Regex.Split strings like this one:

string criteria = "NAME='Eduard O' Brian'   COURSE='Math II' TEACHER = 'Chris Young' SCHEDULE='3' CAMPUS='C-1' ";

We have the following 'reserved words': NAME, COURSE, TEACHER, SCHEDULE, CAMPUS. It is required to split the original string into:

NAME='Eduard O' Brian'
COURSE='Math II'
TEACHER = 'Chris Young'
SCHEDULE='3'
CAMPUS='C-1'

The criteria for Split is: to have the simple quote, followed by one or more spaces, followed by a 'reserved word'.

The closest expression I achieved is:

var match = Regex.Split(criteria, @"'[\s+]([NAME]|[COURSE]|[TEACHER]|[SCHEDULE]|[CAMPUS])", RegexOptions.CultureInvariant);

This is the complete source code:

using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            string criteria = "NAME='Eduard O' Brian'  COURSE='Math II' TEACHER = 'Chris Young' SCHEDULE='3' CAMPUS='C-1' ";

            var match = Regex.Split(criteria, @"'[\s+]([NAME]|[COURSE]|[TEACHER]|[SCHEDULE]|[CAMPUS])", RegexOptions.CultureInvariant);

            foreach (var item in match)
                Console.WriteLine(item.ToString());

            Console.Read();
        }
    }
}

My code is doing this:

NAME='Eduard O' Brian'   COURSE='Math II
T
EACHER = 'Chris Young
S
CHEDULE='3
C
AMPUS='C-1

It is deleting the last simple quote and is taking only the first letter of the reserved word. And COURSE in this sample has more than one space and is not working for it.

Thanks in advance!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Mercy
  • 75
  • 1
  • 7
  • What does your code do now? What about it isn't working? – mjwills Apr 17 '18 at 21:48
  • 1
    Any thing you are putting inside square brackets `[ ]` is going to match any of the single characters in there. `[NAME]` is going to match "N","A","M",and "E" – Matti Price Apr 17 '18 at 21:50
  • Your `criteria` format is inconsistent; you need to distinguish an apostrophe in a NAME like O'Brian from a apostrophe used to separate fields. – Dour High Arch Apr 17 '18 at 22:26
  • Thank you all. @DourHighArch since this is a legacy application I cannot change the use apostrophes as separators. This is the reason because I plan to use the next 'reserved word' as the way to distinguish between sentences – Mercy Apr 17 '18 at 22:32
  • First, change `[NAME]|[COURSE]|[TEACHER]|[SCHEDULE]|[CAMPUS]` to `NAME|COURSE|TEACHER|SCHEDULE|CAMPUS`. As @MattiPrice pointed out, what you have right now is a [character class](https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference#character_classes), which matches any of the characters within the `[ ]` – Jim Mischel Apr 17 '18 at 22:33

1 Answers1

3

You may simply split with 1+ whitespaces that are followed with your reserved words followed with =:

var results = Regex.Split(s, @"\s+(?=(?:NAME|COURSE|TEACHER|SCHEDULE|CAMPUS)\s*=)");

See the regex demo

Pattern details

  • \s+ - 1 or more whitespace chars
  • (?= - start of a positive lookahead that, immediately to the right of the current location, requires the following text:
    • (?:NAME|COURSE|TEACHER|SCHEDULE|CAMPUS) - any of the alternative literal texts
    • \s* - 0 or more whitespace chars (as there can be space(s) between reserved words and =)
    • = - an equal sign
  • ) - end of the lookahead.

C# demo:

var criteria = "NAME='Eduard O' Brian'  COURSE='Math II' TEACHER = 'Chris Young' SCHEDULE='3' CAMPUS='C-1' ";
var match = Regex.Split(criteria, @"\s+(?=(?:NAME|COURSE|TEACHER|SCHEDULE|CAMPUS)\s*=)");
Console.WriteLine(string.Join("\n", match));

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563