9

I understand the concept of writing regular expressions using capturing and non-capturing groups.

Ex:

a(b|c) would match and capture ab and ac

a(?:b|c) would match ab and ac but capture a

But how is it useful when I make a new custom grok pattern and what it means to use non-capturing groups.

Looking at a few existing grok patterns like the one below for HOUR:

HOUR (?:2[0123]|[01]?[0-9])

Here we can match the hour format using (2[0123]|[01]?[0-9]) as well. What makes the grok pattern use the non-capturing expression here? Based on what parameters should I decide to use this (?:subex)

baudsp
  • 4,076
  • 1
  • 17
  • 35
sruthi
  • 91
  • 1
  • 8
  • I don't know what language you are using, but I think that's completely wrong. `a(b|c)` usually captures `b` or `c` (depending on whether the pattern matched `ab` or `ac`), and `a(?:b|c)` captures nothing at all. The difference is one of performance; why capture something when you don't need to do? – ikegami Jul 08 '16 at 16:43
  • Re "*Here we can match the hour format using `(2[0123]|[01]?[0-9])` as well*", No surprise there; capturing doesn't change what a pattern matches. – ikegami Jul 08 '16 at 16:45

1 Answers1

9

The difference between a pattern with a capturing group or without in Grok is whether you need to create a field or not.

The (?:2[0123]|[01]?[0-9]) pattern contains a non-capturing group that is only used for grouping subpattern sequences. The (2[0123]|[01]?[0-9]) regex contains a numbered capturing group that matches and captures the value (=stores in some additional buffer with ID equal to the order of the capture group in the pattern). Mind that there are also named capture groups, like (?<field>2[0123]|[01]?[0-9]) that assigns the value captured to a named group.

With named_captures_only parameter set to false, a(b|c) regex will match ab or ac and assign a b or c to a separate field. When you use a non-capturing group a(?:b|c), no field will ever get created, this text will only be matched.

Since named_captures_only parameter default value is True, the difference between a numbered capturing or non-capturing group is removed in Grok patterns. So, by default only named captures (like a(?<myfield>b|c)) can be used to create fields.

I think the preference is given to non-capturing groups in common Grok patterns in order not to depend on the named_captures_only parameter setting.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • The difference between a pattern with a capturing group or without in Grok is whether you need to create a field or not. This helped answer my question and now i could implement the same in the matches. – sruthi Jul 08 '16 at 18:40
  • I would also like to know the significance of named_captures_only. If I define a set of grok patterns, it would match only the ones that are defined in the pattern set?? That's what it means?. – sruthi Jul 08 '16 at 18:41
  • That means only named captures will be taken, numbered ones will be ignored. – Wiktor Stribiżew Jul 08 '16 at 19:31