2

If I have a simple grammar in tree-sitter:

rules: {
    expr: $ => choice(
        /[0-9]+/,
        prec.right(seq($.expr, /[+-]/, $.expr)),
    )
}

And an input:

3+4

I get the followng CST:

(start [0, 0] - [0, 3]
  (expr [0, 0] - [0, 3]
    (expr [0, 0] - [0, 1])
    (expr [0, 2] - [0, 3])))

So my question is, how do I get the values, i.e. what was parsed, from these nodes/leafes. I somehow have to evaluate the tree. I'm certainly sure there is way, because I can also do syntax-highlighting with tree-sitter, for what I need the values (I guess). But I read the documentation and couldn't find any note, how to do it.

Rip
  • 73
  • 2
  • 8

1 Answers1

5

Tree-sitter's syntax tree doesn't store copies of the input text. So to get the text of a particular token, you would have to use the ranges that Tree-sitter gives you to compute slices of your original source code.

In the python binding, this looks like this:

source_code_bytes = b'3 + 4'
tree = parser.parse(source_code_bytes)
node1 = tree.root_node.children[0].children[0]

node1_text = source_code_bytes[node1.start_byte:node1.end_byte].decode('utf8')
assert node1_text == '3'

In some language bindings, like the wasm binding, there is a .text helper for making this easier.

There is an open issue for adding this kind of helper function to the python binding.

maxbrunsfeld
  • 341
  • 2
  • 2