Suppose I have the following text:
# Should match
- [ ] Some task
- [ ] Some task | [[link]]
- [ ] Some task ^abcdef
- [ ] Some task | [[link]] ^abcdef
- [ ] ! Some task
- [ ] ! Some task | [[link]]
- [ ] ! Some task ^abcdef
- [ ] ! Some task | [[link]] ^abcdef
- [ ] Task one | [ ] ! Task two | [ ] Task three ^abcdef
| Tracker | Task | Backlog |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ] Task item | [[linK]] |
| 00:00-00:00 | [ ] Task item ^abcdef | [[link]] |
| 00:00-00:00 | [ ] [[task-item]] | [[link]] |
| 00:00-00:00 | [ ] ! Task item | [[linK]] |
| 00:00-00:00 | [ ] ! Task item ^abcdef | [[link]] |
| 00:00-00:00 | [ ] ! [[task-item]] | [[link]] |
# Should not match
- [ ]
- [ ]
- [ ]
- [ ] !
- [ ] !
- [ ] !
| Tracker | Task | Backlog |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ] | [[linK]] |
| 00:00-00:00 | [ ] ! | [[linK]] |
I am interested in several capture groups as follows:
group
$1
:- match:
[
and]
- match:
group
$2
:- match: any single character (e.g.,
\s
) between[
and]
- match: any single character (e.g.,
group
$3
:- match:
!
,?
, or*
that follows after[ ]
- match:
group
$4
:- match: task text after
[ ]
without modifier present
- match: task text after
group
$5
:- match: task text after
[ ] !
with modifier present
- match: task text after
I came up with the following regex
(i.e., see demo here):
(?<= \s )
# Match opening braket (i.e., `[`).
( \[ )
# Match any single character (e.g., `x`).
( . )
# Matching closing braket (i.e., `]`)
( \] )
(?= \s* [?!*]? \s* )
# Exclude entries without text (i.e., incl. in tables).
(?!
\s* [?!*]? \s* \|
|
\s* [?!*]? \s* $
)
# Match the text (i.e., capture based on modifier presence).
(?:
# Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
\s* ( [!?*] ) \s* ( .*? )
|
# Match the text that does not follow a modifier.
\s* (?! [!?*]) \s* ( .*? )
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)
Which seems to work (i.e., see the picture below), with one exception. The [
and ]
notation brackets are captured in separate groups (i.e., [
in the group $1
and ]
in the group $3
). How can I capture [
and ]
as part of the same group (i.e., $1
)?
I am using this regex
for a TextMate grammar in VS Code
and according to the documentation the expression needs to be a valid Oniguruma regular expression. Based on some attempts, I noticed that the following are not supported:
- branch resets (i.e.,
\K
) - capturing inside lookarounds
- named capture groups
Edit
The fourth bird indicated in the comments that with the /J
flag enabled the regex
below works (i.e., see demo):
(?<= \s )
# Match opening braket (i.e., `[`).
(?<g1> \[)
# Match any single character (e.g., `x`).
(?<g2> .)
# Matching closing braket (i.e., `]`)
(?<g1> \])
(?= \s* [?!*]? \s* )
# Exclude entries without text (i.e., incl. in tables).
(?!
\s* [?!*]? \s* \|
|
\s* [?!*]? \s* $
)
# Match the text (i.e., capture based on modifier presence).
(?:
# Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
\s* (?<g3>[!?*]) \s* (?<g4>.*?)
|
# Match the text that does not follow a modifier.
\s* (?! [!?*]) \s* (?<g5>.*?)
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)
It does. However, as I just discovered, it seems that I cannot use named capture groups for TextMate
grammars and, therefore, I need a different solution.