0

Suppose I have the following text:

# Should match

- [ ] Some task
- [ ] Some task | [[link]]
- [ ] Some task ^abcdef
- [ ] Some task | [[link]] ^abcdef
- [ ] ! Some task
- [ ] ! Some task | [[link]]
- [ ] ! Some task ^abcdef
- [ ] ! Some task | [[link]] ^abcdef
- [ ] Task one | [ ] ! Task two | [ ] Task three ^abcdef

|     Tracker | Task                    | Backlog  |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ] Task item           | [[linK]] |
| 00:00-00:00 | [ ] Task item ^abcdef   | [[link]] |
| 00:00-00:00 | [ ] [[task-item]]       | [[link]] |
| 00:00-00:00 | [ ] ! Task item         | [[linK]] |
| 00:00-00:00 | [ ] ! Task item ^abcdef | [[link]] |
| 00:00-00:00 | [ ] ! [[task-item]]     | [[link]] |

# Should not match

- [ ] 
- [ ]
- [ ]  
- [ ] ! 
- [ ] !
- [ ] !  

|     Tracker | Task                    | Backlog  |
| ----------: | :---------------------- | :------- |
| 00:00-00:00 | [ ]                     | [[linK]] |
| 00:00-00:00 | [ ] !                   | [[linK]] |

I am interested in several capture groups as follows:

  • group $1:

    • match: [ and ]
  • group $2:

    • match: any single character (e.g., \s) between [ and ]
  • group $3:

    • match: !, ?, or * that follows after [ ]
  • group $4:

    • match: task text after [ ] without modifier present
  • group $5:

    • match: task text after [ ] ! with modifier present

I came up with the following regex (i.e., see demo here):

(?<= \s )
  # Match opening braket (i.e., `[`).
  ( \[ )

  # Match any single character (e.g., `x`).
  ( . )

  # Matching closing braket (i.e., `]`)
  ( \] )
(?= \s* [?!*]? \s* )

# Exclude entries without text (i.e., incl. in tables).
(?! 
  \s* [?!*]? \s* \|
  |
  \s* [?!*]? \s* $
)

# Match the text (i.e., capture based on modifier presence).
(?:
  # Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
  \s* ( [!?*] ) \s* ( .*? )
  |
  # Match the text that does not follow a modifier.
  \s* (?! [!?*]) \s* ( .*? )
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)

Which seems to work (i.e., see the picture below), with one exception. The [ and ] notation brackets are captured in separate groups (i.e., [ in the group $1 and ] in the group $3). How can I capture [ and ] as part of the same group (i.e., $1)?

demo for regex mentioned

I am using this regex for a TextMate grammar in VS Code and according to the documentation the expression needs to be a valid Oniguruma regular expression. Based on some attempts, I noticed that the following are not supported:

  • branch resets (i.e., \K)
  • capturing inside lookarounds
  • named capture groups

Edit

The fourth bird indicated in the comments that with the /J flag enabled the regex below works (i.e., see demo):

(?<= \s )
  # Match opening braket (i.e., `[`).
  (?<g1> \[)

  # Match any single character (e.g., `x`).
  (?<g2> .)

  # Matching closing braket (i.e., `]`)
  (?<g1> \])
(?= \s* [?!*]? \s* )

# Exclude entries without text (i.e., incl. in tables).
(?! 
  \s* [?!*]? \s* \|
  |
  \s* [?!*]? \s* $
)

# Match the text (i.e., capture based on modifier presence).
(?:
  # Match modifier (i.e., `!`, `?`, or `*`) and the text that follows.
  \s* (?<g3>[!?*]) \s* (?<g4>.*?)
  |
  # Match the text that does not follow a modifier.
  \s* (?! [!?*]) \s* (?<g5>.*?)
)
# Match until either of the stops that follow are met.
(?= \s+ \^[a-z0-9]{6,} | \s+ \| | \s*$)

It does. However, as I just discovered, it seems that I cannot use named capture groups for TextMate grammars and, therefore, I need a different solution.

Mihai
  • 2,807
  • 4
  • 28
  • 53
  • 1
    You can use the `J` flag and then use the same group name. See https://regex101.com/r/0HQc4t/1 Note that you can omit `{1}` – The fourth bird Apr 23 '22 at 11:34
  • @Thefourthbird, that is amazing! I am trying the `J` flag right now to see if it works in `VS Code`. Oh, thanks for indicating that I can omit `{1}`. – Mihai Apr 23 '22 at 11:40
  • @MikeM, I need it for syntax highlighting (e.g., https://imgur.com/a/3SEXkqL). – Mihai Apr 23 '22 at 11:45
  • @MikeM https://en.wikipedia.org/wiki/Oniguruma – The fourth bird Apr 23 '22 at 11:53
  • @Thefourthbird, sadly I cannot get the flag `/J` to work in `VS Code`. Also, even something as simple as this (i.e., `(?task)`) fails. I thought named capture groups are supported in Oniguruma... I will try some more, but I guess my best bet is still to figure out a way to adjust the expression. – Mihai Apr 23 '22 at 11:57
  • 1
    Instead of asking a new question, I will adjust this one to drop the named groups and fit my scenario better. Despite this, I am thankful you mentioned the `/J` flag. – Mihai Apr 23 '22 at 12:12

0 Answers0