1

I have a string which I want to split in two. Usually it is a name, operator and a value. I'd like to split it into name and value. The name can be anything, the value too. What I have, is an array of operators and my idea is to use it as separators:

var input = "name>=2";
var separators = new string[]
{
    ">",
    ">=",
};
var result = input.Split(separators, StringSplitOptions.RemoveEmptyEntries);

Code above gives result being name and =2. But if I rearrange the order of separators, so the >= would be first, like this:

var separators = new string[]
{
    ">=",
    ">",
};

That way, I'm getting nice name and 2 which is what I'm trying to achieve. Sadly, keeping the separators in a perfect order is a no go for me. Also, my collection of separators is not immutable. So, I'm thinking maybe I could split the string with longer separators given precedence over the shorter ones?

Thanks for help!

Here is a related question, explaining why such behaviour occurs in Split() method.

Prolog
  • 2,698
  • 1
  • 18
  • 31
  • `var result = Regex.Split(input, "[><=]+");` - we split on any combination of `<`, `>` and `=`, e.g. `name>>4`, `name===other`, `input<>-456`, `name<=5`, `name>=7` – Dmitry Bychenko Jun 10 '19 at 13:37
  • `var result = input.Split(separators.OrderByDescending(s => s.Length).ToArray(), StringSplitOptions.RemoveEmptyEntries);` - we *sort* separators in the right way, and then split on some of them – Dmitry Bychenko Jun 10 '19 at 13:45

2 Answers2

2

You may try doing a regex split on an alternation which lists the longer >= first:

var input = "name>=2";
string[] parts = Regex.Split(input, "(?:>=|>)");
foreach(var item in res)
{
    Console.WriteLine(item.ToString());
}

This prints:

name
2

Note that had we split on (?:>|>=), the output would have been name and =2.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Might be worth pointing out OP could generate this pattern using their `separators` array and a bit of `OrderByDescending` and `String.Join` action. – Jamiec Jun 10 '19 at 13:42
  • @Jamiec I wish I knew more C#, in which case I could have also offered those options `:-)` – Tim Biegeleisen Jun 10 '19 at 13:43
  • 1
    `var pattern = "(?:" + String.Join("|",separators.OrderByDescending(s => s.Length)) + ")"` (eg: https://rextester.com/INYGH99691) – Jamiec Jun 10 '19 at 13:49
  • Thank you for answer @Tim and jamiec for your valuable comment. I have access to the operators collection, but I can't be 100% sure what the operators will look like, because I allow adding operators in customization. Because of this I'd use Regex as an absolute finality. The operators collection can be extended, so I prefer more future proof and less risky solution. Maybe I should write more on reasons why the operators collection is not immutable, sorry about that. Anyway, thanks again for the answer, I'm certain someone with similar problem will find here solution thanks to you. – Prolog Jun 10 '19 at 14:46
2

You can try several options. If you have a colelction of the separators, you can sort them in the right order before splitting:

  using System.Linq;

  ...

  var result = input.Split(
    separators.OrderByDescending(item => item.Length), // longest first
    StringSplitOptions.RemoveEmptyEntries);

You can try organizing all (including possible) separators into a single pattern, e.g.

 [><=]+

here we split by the longest sequence of >, < and =

 var result = Regex.Split(input, "[><=]+");

Demo:

  using System.Text.RegularExpressions;

  ...

  string[] tests = new string[] {
    "name>123",
    "name<4",
    "name=78",
    "name==other",
    "name===other",
    "name<>78",
    "name<<=4",
    "name=>name + 455",
    "name>=456",
    "a_b_c=d_e_f",
  };

  string report = string.Join(Environment.NewLine, tests
    .Select(test => string.Join("; ", Regex.Split(test, "[><=]+"))));

  Console.Write(report);

Outcome:

name; 123
name; 4
name; 78
name; other
name; other
name; 78
name; 4
name; name + 455
name; 456
a_b_c; d_e_f
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • Thank you for your answer @Dmitry. Sorting the operators seems like good idea. Altough, in your second solution I notice a major flaw. If a name would contain a sign like underscore e.g. `first_name` and separators collection would contain any separator that uses underscore as well e.g. `_=`, then `first_name` would be split too into `first` and `name`, which is not quite the result I'm looking for. – Prolog Jun 10 '19 at 14:15
  • @Prolog: yes, when organizing all separators into a single pattern in general case we can't just mechanically glue them together. In case of `_=` separator we want to add `_=` we can try, say `_?[><=]+` pattern: at most one `_` followed by at least one of `>`, '<', `=` symbols. So `_=` is a separtor, when `_` - not – Dmitry Bychenko Jun 10 '19 at 14:22