5

I have a string input which looks like this var input = "AB-PQ-EF=CD-IJ=XY-JK". I want to know if there is a way using string.split() method in C# and LINQ such that I can get an array of strings which looks like this var output = ["AB-PQ", "PQ-EF", "EF=CD", "CD-IJ", "IJ=XY", "XY-JK"]. Currently I am doing the same conversion manually by iterating the input string.

pango89
  • 265
  • 1
  • 3
  • 9

7 Answers7

7

Can you use a regex instead of split?

var input = "AB-PQ-EF=CD-IJ=XY-JK";
var pattern = new Regex(@"(?<![A-Z])(?=([A-Z]+[=-][A-Z]+))");
var output = pattern.Matches(input).Cast<Match>().Select(m => m.Groups[1].Value).ToArray();
Rawling
  • 49,248
  • 7
  • 89
  • 127
0

Here is a working script. If you had a constant fixed delimiter, you'd only be looking at a single call to Regex.split. Your original string doesn't have that, but we can easily enough make some duplications in that input such that the string becomes splittable.

string input = "ABC-PQ-EF=CD-IJ=XYZ-JK";
string s = Regex.Replace(input, @"((?<=[=-])[A-Z]+(?=[=-]))", "$1~$1");
Console.WriteLine(s);
var items = Regex.Split(s, @"(?<=[A-Z]{2}[=-][A-Z]{2})[~]");
foreach (var item in items)
{
    Console.WriteLine(item);
}

ABC-PQ~PQ-EF~EF=CD~CD-IJ~IJ=XYZ~XYZ-JK
ABC-PQ
PQ-EF
EF=CD
CD-IJ
IJ=XYZ
XYZ-JK

Demo

If you look closely at the very first line of the output above, you'll see the trick I used. I just connected the pairs you want via a different delimiter (ideally ~ does not appear anywhere else in your string). Then, we just have to split by that delimiter.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    AS the OP said, the number of chars is not fixed to 2. – Haytam Jul 13 '18 at 11:38
  • @Haytam I generalized my answer to cover a variable number of characters. – Tim Biegeleisen Jul 13 '18 at 11:40
  • It's not *not-having-a-constant-fixed-delimiter* that stops a single call to `Regex.Split` from working, it's having to duplicate the letters. – Rawling Jul 13 '18 at 11:41
  • @Rawling I don't understand your comment, or what you are trying to say here. If my answer has a flaw, then point it out. – Tim Biegeleisen Jul 13 '18 at 11:42
  • Test it with the string `ABC-PQ-EFZ=CD-IJ=XYZ-JK`, the last output is wrong `XYZ-J JK` – Haytam Jul 13 '18 at 11:42
  • @TimBiegeleisen In your example, I am expecting the last string in the array to be "J-JK". – pango89 Jul 13 '18 at 11:45
  • @pango89 I fixed it. This problem is not easy. – Tim Biegeleisen Jul 13 '18 at 11:47
  • @Tim where you say *If you had a constant fixed delimiter, you'd only be looking at a single call to Regex.split*, I believe you are incorrect. Even if you only had one delimiter, you still wouldn't be able to `Split` such that each letter in the input appeared in two elements in the output. Conversely, if `Split` did work, it would likely work however many possibly delimiters there were. – Rawling Jul 13 '18 at 11:51
  • @Rawling I think I get what you are saying now. But if you read my answer, I rework the string such that each group _is_ neatly divided by a single fixed separator. That was the point of my (perhaps not optimal) answer. – Tim Biegeleisen Jul 13 '18 at 11:55
0

For a solution using string.Split and LINQ, we just need to track the length of each part as we go so that the separator can be pulled from the original string, like so:

var input = "ABC-PQ-EF=CDED-IJ=XY-JKLM";

var split = input.Split('-', '=');

int offset = 0;

var result = split
            .Take(split.Length - 1)
            .Select((part, index) => {
                offset += part.Length;
                return $"{part}{input[index + offset]}{split[index + 1]}";})
            .ToArray();
JonMac1374
  • 466
  • 2
  • 7
0

You can try below approach: Here we will split the string based on special chars.Then we will loop over the elements and select until next char group. ex: Get AB and get values till PQ

        string valentry = "AB-PQ-EF=CD-IJ=XY-JK";
        List<string> filt = Regex.Split(valent, @"[\-|\=]").ToList();

        var listEle = new List<string>();
        fil.ForEach(x => 
            {
                if (valentry .IndexOf(x) != valentry .Length - 2)
                {
                    string ele = valentry.Substring(valentry .IndexOf(x), 5);
                    if (!String.IsNullOrEmpty(ele))
                        listEle.Add(ele);
                }
            });

enter image description here

Lucifer
  • 1,594
  • 2
  • 18
  • 32
0

Could you adapt something like this? Just need to change the factorization.

        List<string> lsOut = new List<string>() { };

        string sInput = "AB-PQ-EF=CD-IJ=XY-JK";
        string sTemp = "";


        for (int i = 0; i < sInput.Length; i++)
        {

            if ( (i + 1) % 6 == 0)
            {
                continue;
            }

            // add to temp
            sTemp += sInput[i];

            // multiple of 5, add all the temp to list
            if ( (i + 1 - lsOut.Count) % 5 == 0)
            {
                lsOut.Add(sTemp);
                sTemp = "";
            }

            if(sInput.Length == i + 1)
            {
                lsOut.Add(sTemp);
            }

        }
atoms
  • 2,993
  • 2
  • 22
  • 43
0
        string input = "AB-PQ-EF=CD-IJ=XY-JK";
        var result = new Regex(@"(?<![A-Z])(?=([A-Z]+[=-][A-Z]+))").Matches(input)
            .Cast<Match>().Select(m => m.Groups[1].Value).ToArray();
        foreach (var item in result)
        {
            Console.WriteLine(item);
        }
0

Recently learning Haskell, so here is a recursive solution.

static IEnumerable<string> SplitByPair(string input, char[] delimiter)
{
    var sep1 = input.IndexOfAny(delimiter);
    if (sep1 == -1)
    {
        yield break;
    }
    var sep2 = input.IndexOfAny(delimiter, sep1 + 1);
    if (sep2 == -1)
    {
        yield return input;
    }
    else
    {
        yield return input.Substring(0, sep2);
        foreach (var other in SplitByPair(input.Substring(sep1 + 1), delimiter))
        {
            yield return other;
        }
    }
}

Good things are

  • It's lazy
  • Easy to extend to other conditions and other data types. However, it's a little hard in C# because C# lacks of Haskell's List.span and pattern match.
qxg
  • 6,955
  • 1
  • 28
  • 36