I have some "tokenized" templates, for example (I call tokens the part between double braces):
var template1 = "{{TOKEN1}} is a {{TOKEN2}} and it has some {{TOKEN3}}";
I want to extract an array from this sentence, in order to have something like:
Array("{{TOKEN1}}",
" is a ",
"{{TOKEN2}}",
" and it has some ",
"{{TOKEN3}}");
I've tried to achieve that with the following Regex code:
Regex r = new Regex(@"({{[^\}]*}})");
var n = r.Split(template1);
And the result is:
Array("",
"{{TOKEN1}}",
" is a ",
"{{TOKEN2}}",
" and it has some ",
"{{TOKEN3}}",
"");
The first issue was that I was not able to recover the tokens from the sentence. I solved this just by adding the parentheses on the Regex expression, even though I'm not sure why does it solves this.
The issue I'm currently facing is the extra empty term in the beginning and/or in the end of the array when the first and/or last terms on the template are "tokens". Why is it happening? Am I doing something wrong, or I should I always check these two positions for emptiness?
On my code, I will need to know which term came from a token and which was a fixed position on the template. On this solution, I will have to check every array's position for a string starting with "{{" and ending with "}}", which I don't think is the best possibility. So, if someone comes up with a better solution to break these things apart, I'll be glad to know!
Thank you!
Edit: as requested, I'll post a simple example to why do I need this distinction on tokens and text.
public abstract class TextParts { }
public class TextToken : TextParts { }
public class TextConstant : TextParts { }
var list = new List<TextParts>();
list.Add( new TextToken("{{TOKEN1}}") );
list.Add( new TextConstant(" is a ") );
list.Add( new TextToken("{{TOKEN2}}") );
/* and so on */
This way, I'll have a list of the parts that composes my string and I'll be able to record that on my database to allow future manipulation and substitution. In fact, each of this TOKEN will be replaced by a Regex string.
The objective is that users will be able to input messages like "{{SERVER}} is not listening on port {{PORT}}", and I'll be able to replace "{{SERVER}}" to [a-zA-Z0-9 ]+
and "{{PORT}}" to \d{1,5}
. Makes sense?
I hope this makes the post more clear.