0

I have this string:

IMD+F++:::PS4 SAINTS R IV R?+GA'

I would like to split it up in two steps. First I would like to split on + except escaped plusses "?+". Second I want to split the result on :, except escaped colons "?:".

With the following Regex I can successfully split my string:

string[] Data = Regex.Split("IMD+F++:::PS4 SAINTS R IV R?+GA'", @"(?<![\?])[\+]+"); 

result:

[0] IMD
[1] F
[2] :::PS4 SAINTS R IV R?+GA'

The result is incorrect. It should be 4 inputs into the array. It removes empty resuls. I need the empty results to stay in the array. The result should be:

[0] IMD
[1] F
[2]
[3] :::PS4 SAINTS R IV R?+GA'

Does anyone know why it behaves this way? Any suggestions?

Cœur
  • 37,241
  • 25
  • 195
  • 267
jjtilly
  • 3
  • 2
  • You should focus on one problem at a time - if you want to split on colons as well, I suggest you ask that as a separate question. (Given that your expected result includes colons, presumably you're not trying to do that just yet.) – Jon Skeet Jan 23 '15 at 07:33
  • Yes, I will be doing one split at a time. The main problem was keeping the empy entries in. But it's solved now. – jjtilly Jan 23 '15 at 07:45

2 Answers2

3

You're explicitly saying that you want to split on "at least one plus" - that's what [\+]+ means. That's why it's treating ++ as a single separator. Just split on a single plus - and note that you don't need to put that into a set of characters:

string[] data = Regex.Split("IMD+F++:::PS4 SAINTS R IV R?+GA'", @"(?<!\?)\+");

If you do want to put it into a set of characters, you don't need to escape it -the only reason for escaping it above is to say "this isn't a group quantifier, it's just a plus character". So this is equally good:

string[] data = Regex.Split("IMD+F++:::PS4 SAINTS R IV R?+GA'", @"(?<![?])[+]");
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
2

Just remove the + after the character class. Because + greedily matches the previous token one or more times. So this [\+]+ regex matches not only the single plus but also the following pluses.

string[] Data = Regex.Split("IMD+F++:::PS4 SAINTS R IV R?+GA'", @"(?<![?])[+]")
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274