0

I'm looking to use CloudWatch Logs Insights to group logs by a request url field, however the url can contain 0-2 unique numerical identifiers that I'd like to be ignored when doing the grouping.

Some examples of urls:

/dev/user
/dev/user/123
/dev/user/123/inventory/4
/dev/server/3/statistics

The groups would look something like:

/dev/user
/dev/user/
/dev/user//inventory/
/dev/server//statistics

I have something quite close to what I need which extracts the section of the url in front of the first optional identifier and the section between the first identifier and the second identifier and concatenates the two, but it isn't totally reliable. This is where I'm at currently, @message is valid json which containers an 'endpoint' field that looks like one of the urls above:

fields @message | parse endpoint /(\bdev)\/(?<@prefix>[^0-9]+)(?:[0-9]+)(?<@suffix>[^0-9]+)/ | stats count(*) by @prefix

While this query will work with endpoints like '/dev/accounts/1' it ignores endpoints like '/dev/accounts' as it doesn't have all of the components the regex is looking for, which means I'm missing a lot of results.

Cral
  • 51
  • 6

2 Answers2

2

If there are 0-2 numerical identifiers that you want to remove, you could match the first and optionally match the second number and use 2 capturing groups to capture what you want to keep.

In the replacement use the 2 capturing groups $1$2

^(.*?\/)\d+(?:(.*?\/)\d+\b)?

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Thanks, there are a couple of neat tricks in there I wasn't aware of. I've added my own answer below, all I needed in the end was a '?' in front of the optional capture groups. – Cral Jul 29 '20 at 15:16
0

Looks like I can use question marks outside of capture groups to mark those groups as optional, which has resolved the last issue I was having.

Regex demo

Cral
  • 51
  • 6