How to combine non-adjacent groups without using branch resets or capturing inside lookarounds?

Question

Suppose I have the following text:

# Should match

- [ ] Some task
- [ ] Some task | [[link]]
- [ ] Some task ^abcdef
- [ ] Some task | [[link]] ^abcdef
- [ ] ! Some task
- [ ] ! Some task | [[link]]
- [ ] ! Some task ^abcdef
- [ ] ! Some task | [[link]] ^abcdef
- [ ] Task one | [ ] ! Task two | [ ] Task three ^abcdef

|     Tracker | Task                    | Backlog  |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ] Task item           | [[linK]] |
| 00:00-00:00 | [ ] Task item ^abcdef   | [[link]] |
| 00:00-00:00 | [ ] [[task-item]]       | [[link]] |
| 00:00-00:00 | [ ] ! Task item         | [[linK]] |
| 00:00-00:00 | [ ] ! Task item ^abcdef | [[link]] |
| 00:00-00:00 | [ ] ! [[task-item]]     | [[link]] |

# Should not match

- [ ] 
- [ ]
- [ ]  
- [ ] ! 
- [ ] !
- [ ] !  

|     Tracker | Task                    | Backlog  |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ]                     | [[linK]] |
| 00:00-00:00 | [ ] !                   | [[linK]] |

I am interested in several capture groups as follows:

group $1:
- match: [ and ]
group $2:
- match: any single character (e.g., \s) between [ and ]
group $3:
- match: !, ?, or * that follows after [ ]
group $4:
- match: task text after [ ] without modifier present
group $5:
- match: task text after [ ] ! with modifier present

I came up with the following regex (i.e., see demo here):

(?<= \s )
  # Match opening braket (i.e., `[`).
  ( \[ )

  # Match any single character (e.g., `x`).
  ( . )

  # Matching closing braket (i.e., `]`)
  ( \] )
(?= \s* [?!*]? \s* )

# Exclude entries without text (i.e., incl. in tables).
(?! 
  \s* [?!*]? \s* \|
  |
  \s* [?!*]? \s* $
)

# Match the text (i.e., capture based on modifier presence).
(?:
  # Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
  \s* ( [!?*] ) \s* ( .*? )
  |
  # Match the text that does not follow a modifier.
  \s* (?! [!?*]) \s* ( .*? )
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)

Which seems to work (i.e., see the picture below), with one exception. The [ and ] notation brackets are captured in separate groups (i.e., [ in the group $1 and ] in the group $3). How can I capture [ and ] as part of the same group (i.e., $1)?

I am using this regex for a TextMate grammar in VS Code and according to the documentation the expression needs to be a valid Oniguruma regular expression. Based on some attempts, I noticed that the following are not supported:

branch resets (i.e., \K)
capturing inside lookarounds
named capture groups

Edit

The fourth bird indicated in the comments that with the /J flag enabled the regex below works (i.e., see demo):

(?<= \s )
  # Match opening braket (i.e., `[`).
  (?<g1> \[)

  # Match any single character (e.g., `x`).
  (?<g2> .)

  # Matching closing braket (i.e., `]`)
  (?<g1> \])
(?= \s* [?!*]? \s* )

# Exclude entries without text (i.e., incl. in tables).
(?! 
  \s* [?!*]? \s* \|
  |
  \s* [?!*]? \s* $
)

# Match the text (i.e., capture based on modifier presence).
(?:
  # Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
  \s* (?<g3>[!?*]) \s* (?<g4>.*?)
  |
  # Match the text that does not follow a modifier.
  \s* (?! [!?*]) \s* (?<g5>.*?)
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)

It does. However, as I just discovered, it seems that I cannot use named capture groups for TextMate grammars and, therefore, I need a different solution.

You can use the `J` flag and then use the same group name. See https://regex101.com/r/0HQc4t/1 Note that you can omit `{1}` — The fourth bird, Apr 23 '22 at 11:34
@Thefourthbird, that is amazing! I am trying the `J` flag right now to see if it works in `VS Code`. Oh, thanks for indicating that I can omit `{1}`. — Mihai, Apr 23 '22 at 11:40
@MikeM, I need it for syntax highlighting (e.g., https://imgur.com/a/3SEXkqL). — Mihai, Apr 23 '22 at 11:45
@Thefourthbird, sadly I cannot get the flag `/J` to work in `VS Code`. Also, even something as simple as this (i.e., `(?task)`) fails. I thought named capture groups are supported in Oniguruma... I will try some more, but I guess my best bet is still to figure out a way to adjust the expression. — Mihai, Apr 23 '22 at 11:57
Instead of asking a new question, I will adjust this one to drop the named groups and fit my scenario better. Despite this, I am thankful you mentioned the `/J` flag. — Mihai, Apr 23 '22 at 12:12

How to combine non-adjacent groups without using branch resets or capturing inside lookarounds?

Edit

0 Answers0