1

I'm trying come up with syntax highlighter that would match beginning of the line (timestamp) and then beginning of the remaining line. For example:

12:34:56.789 some1 text some2 other text
some3 text some4 other text

I need to capture words some but only if it's at the beginning of the text, ignoring timestamp. So in this example they are some1 and some3

{
  "$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json",
  "name": "my-output",
  "scopeName": "source.my_output",
  "patterns": [
    {
      "begin": "^(\\d{2}:\\d{2}:\\d{2}\\.\\d{3,}\\s)?",
      "end": "$",
      "beginCaptures":{
        "1": {"name": "my-output-date"}
      },
      "patterns": [
        
        {
          "match": "^(some\\d)",
          "captures":{
            "1": {"name": "my-output-red"}
          }
        }
      ]
    }
  ]
}

The problem is beginning of the line may start with a timestamp 12:34:56.789 so in this example it only captures some3

If I remove ^ from the regex: "match": "(some\\d)" it captures all 4 words instead.

Does vscode provide ability split text into chunks and process each chunk as whole text (where we could use ^ and $ on the chunk)?

MattDMo
  • 100,794
  • 21
  • 241
  • 231
vanowm
  • 9,466
  • 2
  • 21
  • 37
  • Perhaps like this `"match": "^(?:\\d{2}:\\d{2}:\\d{2}\\.\\d{3,}\\s)?(some\\d)",` – The fourth bird Oct 03 '22 at 13:12
  • @Thefourthbird that's one way I'm trying to avoid, because there are several dozens of keywords and I'm trying make it less cumbersome without duplications... – vanowm Oct 03 '22 at 13:44
  • Then perhaps you might use `([^\\d\\s]+\\d)` to start the match with non digits followed by digits? – The fourth bird Oct 03 '22 at 13:45
  • The problem is without `^` it might capture in the middle of the string and it seems `^` is not allowed in the nested pattern. For example `^.*` will not match anything if parent pattern matched something. – vanowm Oct 03 '22 at 14:04

1 Answers1

1

Finally, I found a solution: use \G anchor (beginning of text after last match):

{
  "$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json",
  "name": "my-output",
  "scopeName": "source.my_output",
  "patterns": [
    {
      "begin": "^(\\d{2}:\\d{2}:\\d{2}\\.\\d{3,}\\s)?",
      "end": "$",
      "beginCaptures":{
        "1": {"name": "my-output-date"}
      },
      "patterns": [
        
        {
          "match": "\\G(some\\d)",
          "captures":{
            "1": {"name": "my-output-red"}
          }
        }
      ]
    }
  ]
}

However, there is seems to be a bug where it skips a line if previous line is empty. For now, my fix is to replace "end": "$" regex with: "end": "\\r?\\n"

vanowm
  • 9,466
  • 2
  • 21
  • 37