Language server semantic tokens

Question

In the language server protocol specification, the semantic tokens response has a data field that is an array of integers for the tokens.

export interface SemanticTokens {
    /**
     * The actual tokens.
     */
    data: uinteger[];
}

In all the samples, semantic tokens are done client side with a SemanticTokensBuilder that secifies a line and column range for each token type. How do you do that server side? When VSCode sends the "textDocument/semanticTokens/full" method, what does the language server send back?

score 5 · Answer 1 · answered Jan 24 '22 at 15:15

It's perhaps easiest to explain by looking at example. This example is adapted from https://microsoft.github.io/language-server-protocol/specifications/specification-3-17/#textDocument_semanticTokens.

Suppose your file had the contents \n\n foo bars\n\n bazzled\n, i.e. rendered with whitespace it would look like this:



     foo  bars

  bazzled

This has three tokens at the following positions (using 0-indexing):

foo, at line 2, char 5
bars, at line 2, char 10
bazzled, at line 5, char 2

Here's one possible valid data that the server could respond with:

// 1st token,  2nd token,  3rd token
[  2,5,3,0,3,  0,5,4,1,0,  3,2,7,2,0 ]

This is just a compressed form of the following info:

[ { deltaLine: 2, deltaStartChar: 5, length: 3, tokenType: 0, tokenModifiers: 3 },  // first token
  { deltaLine: 0, deltaStartChar: 5, length: 4, tokenType: 1, tokenModifiers: 0 },  // second token
  { deltaLine: 3, deltaStartChar: 2, length: 7, tokenType: 2, tokenModifiers: 0 }   // third token
]

That is, each group of 5 contiguous elements corresponds to one token. These 5 elements have the following meaning, in order:

deltaLine: the line difference between this token and the previous
deltaStartChar: either the start character difference between this token and the previous (if the previous is on the same line), or just the start character of this token (if the previous is on a different line)
length: the length of this token
tokenType: index into the token type legend
tokenModifiers: bit flags for token modifiers

So, for example, the second token has deltaLine = 0, meaning it is on the same line as the first token, and deltaStartChar = 5 means that it starts 5 characters after the first token starts. The first token doesn't have a token before it, so its position is instead taken to be absolute.

The tokenType is an index into the token types legend, which is established during the initialization handshake of the protocol. The legend for the token modifiers is also established during the initialization handshake.

Although the tokenModifiers value above is just an integer, it will be interpreted as a bit vector, where each bit indicates whether the corresponding modifier is on or off. For example, the above assigns the first token (foo) the modifiers 0b11, indicating that both the 0th and the 1st token modifier are active, and all other modifiers do not apply to this token.

Updated link: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_semanticTokens — Jeppe, Nov 05 '22 at 09:57

Language server semantic tokens

1 Answers1