5

When parsing FTX (free text) string, I need to split it using + as a delimiter, but only when it's not preceded by escape character (say, ?). So this string nika ?+ marry = love+sandra ?+ alex = love should be parsed to two strings: nika + marry = love and sandra + alex = love. Using String.Split('+') is obviously not enough. Can I achieve it somehow?

One way, it seems to me, is to replace occurrences of ?+ with some unique character (or a succession of characters), say, @#@, split using "+" as a delimiter and then replace @#@ back to +, but that's unreliable and wrong in any possible way I can think of.

? is used as an escape character only in combination with either : or +, in any other case it's viewed as a regular character.

Hamid Pourjam
  • 20,441
  • 9
  • 58
  • 74
nicks
  • 2,161
  • 8
  • 49
  • 101

2 Answers2

3

An horrible regular expression to split it:

string str = "nika ?+ marry = love??+sandra ???+ alex = love";
string[] splitted = Regex.Split(str, @"(?<=(?:^|[^?])(?:\?\?)*)\+");

It splits on a + (\+) that is preceded by the beginning of the string (^) or a non-? character ([^?]) plus an even number of ? ((?:\?\?)*). There is a liberal use of the (?:) (non-capturing groups) because Regex.Split does funny things if there are multiple capturing groups.

Note that I'm not doing the unescape! So in the end ?+ remains ?+.

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • What if string ends with `?+` ? I don't now we should return nothing or an empty string. Yours return nothing, mine returns empty string. also +1 for horrible regex! – Hamid Pourjam Jun 30 '15 at 12:28
1
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        string s = "nika ?+ marry = love+sandra ?+ alex = love";
        string[] result = Regex.Split(s, "\\?{0}\\+", RegexOptions.Multiline);                  
        s = String.Join("\n", result);      
        Regex rgx = new Regex("\\?\\n");
        s = rgx.Replace(s, "+");
        result = Regex.Split(s, "\\n", RegexOptions.Multiline);                         
        foreach (string match in result)
        {
             Console.WriteLine("'{0}'", match);
        }   
    }
}

Outputs

'nika + marry = love'
'sandra + alex = love'

See https://dotnetfiddle.net/HkcQUw

Vladimir
  • 342
  • 1
  • 8