0

I'm sorry for asking this maybe duplicate question. I checked the existing questions and answers about optional capturing groups. I tried some things but I'm not able to translate the answer to my own example.

This are two imput lines

id:target][label
id:target

I would like to capture id: (group 1), target (group 2) and if ][ is present label (group 3).

The used regex (Python regex) only works on the first line (live example on regex101).

^(.+:)(.*)\]\[(.*)

enter image description here

In the other examples I don't get what the regex makes a capturing group optional. And maybe the delimiter ][ used by me also mix up with my understanding problem.

One thing I tried was this

^(.+:)(.*)(\]\[(.*))?

This doesn't work as expected enter image description here

buhtz
  • 10,774
  • 18
  • 76
  • 149

1 Answers1

1

You could write the pattern using an anchor at the end, a negated character class for group 1, a non greedy quantifier for group 2 and then optionally match a 3rd part:

^([^:]+:)(.*?)(?:]\[(.*))?$

Explanation

  • ^ Start of string
  • ([^:]+:) Group 1, match 1+ chars other than : and then match : using a negated character class
  • (.*?) Group 2, match any char, as few as possible
  • (?: Non capture group to match as a whole part
    • ]\[ Match ][
    • (.*) Group 3, match any character
  • )? Close the non capture group and make it optional
  • $ End of string

See a regex101 demo

If you are only matching for example word characters, this you might consider:

^([^:]+:)(\w+)(?:]\[(\w+))?

See a another regex101 demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Thanks for reply. It seems to me that you optimized my regex exceptional from my question. e.g. `[^:]` It would be great if you could add some more explanations about that. Why is `([^:]+:)` better than `^(.+:)` for example? – buhtz Feb 03 '23 at 10:20
  • 1
    @buhtz I have added an explanation. The part `^(.+:)` first matches until the end of the string as the `.` can match any character including the `:`, and will then backtrack to the last occurrence of `:` If you use a negated character class `[^:]+:` you can not cross matching a `:` and will then stop at the first occurrence. – The fourth bird Feb 03 '23 at 10:24