1

I am attempting to implement a tokenizer with quite specific behaviour, where for example, in the following situation:

1:   Line 1
2:   Line 2
3:
4:   Line 4 
5:   Line 5
6:   Line 6
7:
8:
9:   Line 9

A change on line 1, will cause a retokenization on lines 1-3, a change on line 5 will cause a retokenization on lines 3-8, a change on line 9 will cause a retokenziation on lines 7-9, etc.... This is because the tokenization of a given line might vary based on what happens until the next empty line.

So basically, I'm looking to tokenize the text by chunks delimited with empty lines.

I have a vague prototype working with decorations, but for my usecase there's just no way of using decorations or semantic tokens, mainly for performance reasons. However monaco's setTokensProvider only seems to work on a line-by-line basis: editing a line in a document will cause every subsequent line to be retokenized, with no way of stopping it, and with no way of retokenizing lines prior to it.

Is there any realistic way of doing this currently, even if it's hacky and/or involves fiddling with some unexposed APIs? vscode's textmate grammars are capable of specifying tokens which span multiple lines, so I feel like I'm missing something :/

user2950509
  • 1,018
  • 2
  • 14
  • 37
  • The Monaco tokenizer does *not* work line by line, but is using regular expressions to match text parts (a declarative tokenizer). For example, changing a multiline comment delimiter will also affect many (if not all) input lines. The same what you are trying to achive. – Mike Lischke Feb 04 '21 at 07:54
  • @MikeLischke isn't that only for Monarch? I'm talking about the "manual" tokenizer implementation, using [setTokensProvider](https://github.com/microsoft/monaco-editor/blob/e450fb664120fdabb256e7b31332c73974cb3bb4/monaco.d.ts#L5144), which only accepts a [TokensProvider](https://github.com/microsoft/monaco-editor/blob/e450fb664120fdabb256e7b31332c73974cb3bb4/monaco.d.ts#L5106), and seems to only be capable of going through the text line-by-line? – user2950509 Feb 04 '21 at 14:15
  • Yes, you are right. I didn't even notice there's yet another tokenizer interface. I have always only used the Monarch tokenizer and implemented a very well working syntax highlighting feature with that, which even combines 2 languages into one. – Mike Lischke Feb 05 '21 at 08:20

1 Answers1

0

It sounds like you're looking for DocumentSemanticTokensProvider. Take a look at this blog post:

As its name implies, the DocumentSemanticTokensProvider in Monaco handles providing of semantics for an entire entered document. Unlike a tokens provider, instead of providing a set of regular expressions to tokenize the document, a DocumentSemanticTokensProvider implementation is a callback function: When invoked, the function is provided with the overall model containing the code entered by the user, and it is the responsibility of the implementor to return the necessary semantic information, requiring a full parse.

DonPedro
  • 76
  • 1
  • 4
  • Thanks, I'd already tried that a while ago, but the problem is that there's a pretty noticeable delay between typing and highlighting, even in [the pretty simple examples from the monaco playground](https://microsoft.github.io/monaco-editor/playground.html#extending-language-services-semantic-tokens-provider-example). – user2950509 Feb 01 '22 at 14:12