2

How could a regex pattern look like to capture a substring between 2 delimiters, but excluding some characters (if any) after first delimiter and before last delimiter (if any)? The input string looks for instance like this:

var input = @"Not relevant {

#AddInfoStart Comment:String:=""This is a comment"";

AdditionalInfo:String:=""This is some additional info"" ;

# } also not relevant";

The capture should contain the substring between "{" and "}", but excluding any spaces, newlines and "#AddInfoStart" string after start delimiter "{" (just if any of them present), and also excluding any spaces, newlines and ";" and "#" characters before end delimiter "}" (also if any of them present).

The captured string should look like this

Comment:String:=""This is a comment"";

AdditionalInfo:String:=""This is some additional info""

It is possible that there are blanks before or after the ":" and ":=" internal delimiters, and also that the value after ":=" is not always marked as a string, for instance something like:

{  Val1 : Real := 1.7  }

For arrays is used the following syntax:

arr1 : ARRAY [1..5] OF INT := [2,5,44,555,11];
arr2 : ARRAY [1..3] OF REAL
RickyTad
  • 281
  • 1
  • 3
  • 15
  • I've seen your edit. Do all the numbers have the `.` as the decimal separator? Is there some white-space AFTER the number? Please edit your initial string and add more examples – Rui Jarimba Nov 01 '18 at 18:57
  • What about the types - String, Real, etc? Is there a fixed list of types? – Rui Jarimba Nov 01 '18 at 19:05
  • Basically the integer, floating point, string and bool data types as well as arrays of them, as described here [link](https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_plc_intro/54043198057834891.html&id=1930258581040931468) – RickyTad Nov 01 '18 at 22:24

1 Answers1

2

This is my solution:

  1. Remove the content outside the brackets
  2. Use a regular expression to get the values inside the brackets

Code:

var input = @"Not relevant {

#AddInfoStart Comment:String:=""This is a comment"";

            Val1 : Real := 1.7

AdditionalInfo:String:=""This is some additional info"" ;

# } also not relevant";

// remove content outside brackets
input = Regex.Replace(input, @".*\{", string.Empty);
input = Regex.Replace(input, @"\}.*", string.Empty);

string property = @"(\w+)"; 
string separator = @"\s*:\s*"; // ":" with or without whitespace
string type = @"(\w+)"; 
string equals = @"\s*:=\s*"; // ":=" with or without whitespace
string text = @"""?(.*?)"""; // value between ""
string number = @"(\d+(\.\d+)?)"; // number like 123 or with a . separator such as 1.45
string value = $"({text}|{number})"; // value can be a string or number
string pattern = $"{property}{separator}{type}{equals}{value}";

var result = Regex.Matches(input, pattern)
                  .Cast<Match>()
                  .Select(match => new
                  {
                      FullMatch = match.Groups[0].Value, // full match is always the 1st group
                      Property = match.Groups[1].Value, 
                      Type = match.Groups[2].Value, 
                      Value = match.Groups[3].Value 
                  })
                  .ToList();
Rui Jarimba
  • 11,166
  • 11
  • 56
  • 86
  • Thanks, I am trying to understand your code. How comes that the delimiters "{" and "}" do not have to be present in the regex pattern ? – RickyTad Nov 01 '18 at 12:03
  • @RickyTad my mistake, I forgot that you wanted matches inside the brackets. I've edited my code, take a look. It's much easier to do some cleanup before than trying to use a single regular expression. – Rui Jarimba Nov 01 '18 at 12:23
  • Would be possible to capture from the input string everything what is outside the {} brackets, even when there are several delimiter {} bracket pairs present in the input string? – RickyTad Nov 01 '18 at 16:43
  • @RickyTad yes it is possible, but keep in mind that my code assumes there is only 1 pair of {} – Rui Jarimba Nov 01 '18 at 16:48
  • How would the pattern look like to parse something like "Val1 : Real := 1.7" ? – RickyTad Nov 01 '18 at 18:36
  • @RickyTad based on your initial string, I was assuming there was no space in between. Please edit your question and add more details and text samples. A brief description would help too - if there _might_ be whitespace in between, etc – Rui Jarimba Nov 01 '18 at 18:40
  • Your sample does not seem to work if the number does not contain a . separator (f.e. 123 ) – RickyTad Nov 02 '18 at 08:42
  • @RickyTad fixed, try now – Rui Jarimba Nov 02 '18 at 09:43
  • Thanks, now its better. Do you have a pattern that would also supports arrays? – RickyTad Nov 02 '18 at 12:27
  • @RickyTad sorry man but I can't spend days with this. It should be easy enough to match an array if you know the basics of regular expressions. Also, what would an array look like? Array of what? Numerica values, strings? What are the delimiters - square brackets? .... If you're dealing with regular expressions you need to give a very precise/accurate explanation of what your patterns look like. – Rui Jarimba Nov 02 '18 at 13:58