1

I'm trying to make a regex to extract any word (alphanumeric) between curly brackets as in the example bellow:

Hi {{ name }}, your full name is {{ concat first_name last_name }}

The conditions are:

  1. if the content inside contains only a sequential alpha_numeric, that means it is a variable and should be extracted.
  2. if the content inside contains more than one alpha_numeric separated by space, that means the first occurrence is a function name and should be ignored, but the remaining arguments (function arguments) should be extracted.
  3. each function argument is separated by space.
  4. each variable name can contain more than one word, but each word in the variable name should be connected using a separator, e.g: first_name
  5. the function arguments can have one or many arguments and each argument should be matched.

So the result for the first example should be:

name, first_name, last_name.

This is what I tried: \{\s*\{\s*([^\}\/\s]+)\s*\}\s*\}

But it only covers the first scenario.

Another example:

"Key": "price_change_manager.msg2 {{ message }}"
value": "{{  username }} plan to {{formatCurrence new_price currency old_price }}"   

Should match: message, username, new_price, currency, old_price.

Eleandro Duzentos
  • 1,370
  • 18
  • 36
  • 2
    What language or RegExp flavor are you working with? – esqew Nov 29 '22 at 15:07
  • I edited the question. Golang. – Eleandro Duzentos Nov 29 '22 at 15:11
  • Can you use *two* capturing groups? E.g. [`{\s*{\s*(?:\w+[^\w}](\w[^}]*?)|(\w+))\s*}\s*}`](https://regex101.com/r/pSv6dj/1) – bobble bubble Nov 29 '22 at 15:20
  • Yes, but it is also matching the spaces between too variables. So the result for this regex is: name, first_name last_name (including the space) intead of name, first_name, last_name. – Eleandro Duzentos Nov 29 '22 at 15:26
  • You want to match these first_name, last_name... as separate matches? I doubt that can be done without lookarounds or `\G` anchor. You can't just [split those](https://stackoverflow.com/a/13737890/5527985) later on? – bobble bubble Nov 29 '22 at 15:29
  • Yes, I want to match them all as separate matches and I don't know exactly how to do it using look around. I need to do that using regex only. – Eleandro Duzentos Nov 29 '22 at 15:34
  • Afaik there are no lookarounds available there. How about [this updated demo?](https://regex101.com/r/pSv6dj/3) Can each irst_name or last_name contain multiple words and if so, how are these separated? – bobble bubble Nov 29 '22 at 15:35
  • They are separated by one or many spaces and they can contain multiple words and numbers. – Eleandro Duzentos Nov 29 '22 at 15:38
  • 2
    How would you distinguish then what's part of the first and last name? Please update the question with some realistic sample input, this will help! – bobble bubble Nov 29 '22 at 15:38
  • I updated the question with a new sample. – Eleandro Duzentos Nov 29 '22 at 15:43
  • 1
    Each variable is separated by one or many spaces, but the variable itself can have one or multple words that must be connected somehow, e.g: first_name, last_name, current_timestamp. – Eleandro Duzentos Nov 29 '22 at 15:46
  • 1
    @EleandroDuzentos Have you tried [this demo](https://regex101.com/r/UfQZSG/1) from earlier comment? (It has three capturing groups) – bobble bubble Nov 29 '22 at 15:46
  • Yes, for some reason it is returning many empty results, but yeah, that's it :D – Eleandro Duzentos Nov 29 '22 at 15:49
  • There is only one thing missing, there can be any amount of function arguments, not only 2. In this example it is only matching when the function has 2 arguments, if you add one more it fails. – Eleandro Duzentos Nov 29 '22 at 15:51
  • 1
    I guess you'd need more regex power to solve this arbitrary amount of arguments. Good luck with it and I hope some `go`-expert will get this solved with you. :) Please include all this information in the question (multiple arguments, samples, how separated). – bobble bubble Nov 29 '22 at 15:53
  • Thank you for that @bobblebubble. You almost did it :) – Eleandro Duzentos Nov 29 '22 at 15:55
  • 1
    You're welcome, [here is one more update](https://regex101.com/r/UfQZSG/2) but I guess you want these arguments separately. – bobble bubble Nov 29 '22 at 15:59
  • 1
    Yes, exactly. I need them separated. – Eleandro Duzentos Nov 29 '22 at 16:01

1 Answers1

1

A not particularly overengineered regexp with some ifs does do job, like:

re := regexp.MustCompile(`\{\{ *([a-zA-Z_]+|[a-zA-Z_]+(( +[a-zA-Z_]+)+)) *\}\}`)
matches := re.FindAllStringSubmatch("{{  username }} plan to {{formatCurrence new_price currency old_price }}", -1)

This will result a slice like:

[["{{  username }}" "username" "" ""] ["{{formatCurrence new_price currency old_price }}" "formatCurrence new_price currency old_price" " new_price currency old_price" " old_price"]]

So you can process it like:

  findings := []string{}
  for _, m := range matches {
    if m[2] == "" {
      // When the 3rd element is empty then it's a single match, in the 2nd element
      findings = append(findings, m[1])
    } else {
      // Otherwise it's multi match, in one string in the 3rd element
      // Split it and then append them
      findings = append(findings, strings.Split(strings.Trim(m[2], " "), " ")...)
    }
  }
  // Your result is in findings
Adam Solymos
  • 99
  • 1
  • 5