Don't fixate on Split() being the solution! This is a simple thing to parse without it. Regex answers are probably also OK, but I imagine in terms of raw efficiency making "a parser" would do the trick.
IEnumerable<string> Parse(string input)
{
var results = new List<string>();
int startIndex = 0;
int currentIndex = 0;
while (currentIndex < input.Length)
{
var currentChar = input[currentIndex];
if (currentChar == '{')
{
startIndex = currentIndex + 1;
}
else if (currentChar == '}')
{
int endIndex = currentIndex - 1;
int length = endIndex - startIndex + 1;
results.Add(input.Substring(startIndex, length));
}
currentIndex++;
}
return results;
}
So it's not short on lines. It iterates once, and only performs one allocation per "result". With a little tweaking I could probably make a C#8 version with Index types that cuts on allocations? This is probably good enough.
You could spend a whole day figuring out how to understand the regex, but this is as simple as it comes:
- Scan every character.
- If you find
{
, note the next character is the start of a result.
- If you find
}
, consider everything from the last noted "start" until the index before this character as "a result".
This won't catch mismatched brackets and could throw exceptions for strings like "}}{". You didn't ask for handling those cases, but it's not too hard to improve this logic to catch it and scream about it or recover.
For example, you could reset startIndex
to something like -1 when }
is found. From there, you can deduce if you find {
when startIndex != -1 you've found "{{". And you can deduce if you find }
when startIndex == -1, you've found "}}". And if you exit the loop with startIndex < -1, that's an opening {
with no closing }
. that leaves the string "}whoops" as an uncovered case, but it could be handled by initializing startIndex
to, say, -2 and checking for that specifically. Do that with a regex, and you'll have a headache.
The main reason I suggest this is you said "efficiently". icepickle's solution is nice, but Split()
makes one allocation per token, then you perform allocations for each TrimX()
call. That's not "efficient". That's "n + 2 allocations".