I have a string input which looks like this var input = "AB-PQ-EF=CD-IJ=XY-JK"
.
I want to know if there is a way using string.split()
method in C# and LINQ such that I can get an array of strings which looks like this var output = ["AB-PQ", "PQ-EF", "EF=CD", "CD-IJ", "IJ=XY", "XY-JK"]
. Currently I am doing the same conversion manually by iterating the input string.

- 265
- 1
- 3
- 9
-
1As there is no fixed delimiter in your string for splitting, you need to manually iterate and split the text – Ipsit Gaur Jul 13 '18 at 11:25
-
I know that there will be only two delimiters '-', '=' – pango89 Jul 13 '18 at 11:28
-
But those are also present where you not need to split the string – Ipsit Gaur Jul 13 '18 at 11:28
-
Yes, that is why I was thinking if there is a way to combine the capability of LINQ with split() to achieve this. – pango89 Jul 13 '18 at 11:31
-
Are your string always like that? Meaning 2 chars, delimiter, 2 chars, etc.. – Haytam Jul 13 '18 at 11:35
-
2Have you considered use regular expressions? – mnieto Jul 13 '18 at 11:35
-
@Haytam No, delimiters are fixed but the number of chars surrounding it is not fixed. It can be 3 characters, 4 characters etc. – pango89 Jul 13 '18 at 11:37
7 Answers
Can you use a regex instead of split?
var input = "AB-PQ-EF=CD-IJ=XY-JK";
var pattern = new Regex(@"(?<![A-Z])(?=([A-Z]+[=-][A-Z]+))");
var output = pattern.Matches(input).Cast<Match>().Select(m => m.Groups[1].Value).ToArray();

- 49,248
- 7
- 89
- 127
-
How to use regex if I also want to do the conversion output => input ? – pango89 Sep 12 '18 at 10:53
Here is a working script. If you had a constant fixed delimiter, you'd only be looking at a single call to Regex.split
. Your original string doesn't have that, but we can easily enough make some duplications in that input such that the string becomes splittable.
string input = "ABC-PQ-EF=CD-IJ=XYZ-JK";
string s = Regex.Replace(input, @"((?<=[=-])[A-Z]+(?=[=-]))", "$1~$1");
Console.WriteLine(s);
var items = Regex.Split(s, @"(?<=[A-Z]{2}[=-][A-Z]{2})[~]");
foreach (var item in items)
{
Console.WriteLine(item);
}
ABC-PQ~PQ-EF~EF=CD~CD-IJ~IJ=XYZ~XYZ-JK
ABC-PQ
PQ-EF
EF=CD
CD-IJ
IJ=XYZ
XYZ-JK
If you look closely at the very first line of the output above, you'll see the trick I used. I just connected the pairs you want via a different delimiter (ideally ~
does not appear anywhere else in your string). Then, we just have to split by that delimiter.

- 502,043
- 27
- 286
- 360
-
1
-
@Haytam I generalized my answer to cover a variable number of characters. – Tim Biegeleisen Jul 13 '18 at 11:40
-
It's not *not-having-a-constant-fixed-delimiter* that stops a single call to `Regex.Split` from working, it's having to duplicate the letters. – Rawling Jul 13 '18 at 11:41
-
@Rawling I don't understand your comment, or what you are trying to say here. If my answer has a flaw, then point it out. – Tim Biegeleisen Jul 13 '18 at 11:42
-
Test it with the string `ABC-PQ-EFZ=CD-IJ=XYZ-JK`, the last output is wrong `XYZ-J JK` – Haytam Jul 13 '18 at 11:42
-
@TimBiegeleisen In your example, I am expecting the last string in the array to be "J-JK". – pango89 Jul 13 '18 at 11:45
-
-
@Tim where you say *If you had a constant fixed delimiter, you'd only be looking at a single call to Regex.split*, I believe you are incorrect. Even if you only had one delimiter, you still wouldn't be able to `Split` such that each letter in the input appeared in two elements in the output. Conversely, if `Split` did work, it would likely work however many possibly delimiters there were. – Rawling Jul 13 '18 at 11:51
-
@Rawling I think I get what you are saying now. But if you read my answer, I rework the string such that each group _is_ neatly divided by a single fixed separator. That was the point of my (perhaps not optimal) answer. – Tim Biegeleisen Jul 13 '18 at 11:55
For a solution using string.Split and LINQ, we just need to track the length of each part as we go so that the separator can be pulled from the original string, like so:
var input = "ABC-PQ-EF=CDED-IJ=XY-JKLM";
var split = input.Split('-', '=');
int offset = 0;
var result = split
.Take(split.Length - 1)
.Select((part, index) => {
offset += part.Length;
return $"{part}{input[index + offset]}{split[index + 1]}";})
.ToArray();

- 466
- 2
- 7
You can try below approach:
Here we will split the string based on special chars.Then we will loop over the elements and select until next char group.
ex: Get AB
and get values till PQ
string valentry = "AB-PQ-EF=CD-IJ=XY-JK";
List<string> filt = Regex.Split(valent, @"[\-|\=]").ToList();
var listEle = new List<string>();
fil.ForEach(x =>
{
if (valentry .IndexOf(x) != valentry .Length - 2)
{
string ele = valentry.Substring(valentry .IndexOf(x), 5);
if (!String.IsNullOrEmpty(ele))
listEle.Add(ele);
}
});

- 1,594
- 2
- 18
- 32
Could you adapt something like this? Just need to change the factorization.
List<string> lsOut = new List<string>() { };
string sInput = "AB-PQ-EF=CD-IJ=XY-JK";
string sTemp = "";
for (int i = 0; i < sInput.Length; i++)
{
if ( (i + 1) % 6 == 0)
{
continue;
}
// add to temp
sTemp += sInput[i];
// multiple of 5, add all the temp to list
if ( (i + 1 - lsOut.Count) % 5 == 0)
{
lsOut.Add(sTemp);
sTemp = "";
}
if(sInput.Length == i + 1)
{
lsOut.Add(sTemp);
}
}

- 2,993
- 2
- 22
- 43
string input = "AB-PQ-EF=CD-IJ=XY-JK";
var result = new Regex(@"(?<![A-Z])(?=([A-Z]+[=-][A-Z]+))").Matches(input)
.Cast<Match>().Select(m => m.Groups[1].Value).ToArray();
foreach (var item in result)
{
Console.WriteLine(item);
}

- 21
- 6
Recently learning Haskell, so here is a recursive solution.
static IEnumerable<string> SplitByPair(string input, char[] delimiter)
{
var sep1 = input.IndexOfAny(delimiter);
if (sep1 == -1)
{
yield break;
}
var sep2 = input.IndexOfAny(delimiter, sep1 + 1);
if (sep2 == -1)
{
yield return input;
}
else
{
yield return input.Substring(0, sep2);
foreach (var other in SplitByPair(input.Substring(sep1 + 1), delimiter))
{
yield return other;
}
}
}
Good things are
- It's lazy
- Easy to extend to other conditions and other data types. However, it's a little hard in C# because C# lacks of Haskell's List.span and pattern match.

- 6,955
- 1
- 28
- 36