REGO: Is it possible to parse a regex group from the regex statement?

Question

Cant find any information about regex groups, but what i want to do is: Filter out a string of all ARN's, extract the AWS Services from the ARN's, is it even possible in REGO?

What i currently have:

output = matches {
    string := "\"Resource\": \"arn:aws:states:::lambda:invoke\",\r\n \"Resource\": \"arn:aws:states:::lambda:invoke\",\r\n \"FunctionName\": \"arn:aws:lambda:eu-central-1:521439441813:function:lkfp-test-hello:$LATEST\"\r\n  xecution\"\r\n  \"Resource\": \"arn:aws:states:::aws-sdk:s3:createBucket\"\r\n    }\r\n  }\r\n}\r\n"
    matches := regex.find_n(`arn:([^:\n]*):([^:\n]*):([^:\n]*):([^:\n]*):(([^:\/\n]*)[:\/])?(?:[^"]|"")*"`, string, -1)
}

What it gives as a result:

{
    "output": [
        "arn:aws:states:::lambda:invoke\"",
        "arn:aws:states:::lambda:invoke\"",
        "arn:aws:lambda:eu-central-1:521439441813:function:lkfp-test-hello:$LATEST\"",
        "arn:aws:states:::aws-sdk:s3:createBucket\""
    ]
}

What do i actually expect:

{
    "output": [
        "states",
        "states",
        "lambda",
        "states"
    ]
}

What about`((?=arn:aws:(states|lambda)))` ? – Apr 28 '22 at 15:40 — , Apr 28 '22 at 15:40

score 1 · Answer 1 · answered Apr 28 '22 at 15:43

1

apparently changing find_n to find_all_string_submatch_n separates groups.

answered Apr 28 '22 at 15:43

A K

31
5

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 28 '22 at 19:54

score 0 · Answer 2 · answered Apr 28 '22 at 15:46

The pattern ((?=arn:aws:(states|lambda))) finds what you want in the innermost capturing group. This demonstration is in Python because I don't know rego

import re
string = '"\"Resource\": \"arn:aws:states:::lambda:invoke\",\r\n \"Resource\": \"arn:aws:states:::lambda:invoke\",\r\n \"FunctionName\": \"arn:aws:lambda:eu-central-1:521439441813:function:lkfp-test-hello:$LATEST\"\r\n  xecution\"\r\n  \"Resource\": \"arn:aws:states:::aws-sdk:s3:createBucket\"\r\n    }\r\n  }\r\n}\r\n"'
matches=re.findall(r'((?=arn:aws:(states|lambda)))',string)
for m in matches:
    print(m[1])

output

states
states
lambda
states

score 0 · Answer 3 · answered May 02 '22 at 22:27

I believe the ARN is constructed with the service in the same place every time. It may be easier to just split the string and grab that place in the array, then use comprehension to stick them all together.

https://www.openpolicyagent.org/docs/latest/policy-language/#array-comprehensions

output = services {
    string := "\"Resource\": \"arn:aws:states:::lambda:invoke\",\r\n \"Resource\": \"arn:aws:states:::lambda:invoke\",\r\n \"FunctionName\": \"arn:aws:lambda:eu-central-1:521439441813:function:lkfp-test-hello:$LATEST\"\r\n  xecution\"\r\n  \"Resource\": \"arn:aws:states:::aws-sdk:s3:createBucket\"\r\n    }\r\n  }\r\n}\r\n"
    matches := regex.find_n(`arn:([^:\n]*):([^:\n]*):([^:\n]*):([^:\n]*):(([^:\/\n]*)[:\/])?(?:[^"]|"")*"`, string, -1)
    services := [service | a := matches[_]
                           b := split(a, ":")
                           service := b[2]
                           ]
}

I've also added it to a playground here: https://play.openpolicyagent.org/p/LB17WoRDWT

REGO: Is it possible to parse a regex group from the regex statement?

3 Answers3