How to get the values from nodes in tree-sitter?

Question

If I have a simple grammar in tree-sitter:

rules: {
    expr: $ => choice(
        /[0-9]+/,
        prec.right(seq($.expr, /[+-]/, $.expr)),
    )
}

And an input:

3+4

I get the followng CST:

(start [0, 0] - [0, 3]
  (expr [0, 0] - [0, 3]
    (expr [0, 0] - [0, 1])
    (expr [0, 2] - [0, 3])))

So my question is, how do I get the values, i.e. what was parsed, from these nodes/leafes. I somehow have to evaluate the tree. I'm certainly sure there is way, because I can also do syntax-highlighting with tree-sitter, for what I need the values (I guess). But I read the documentation and couldn't find any note, how to do it.

score 5 · Accepted Answer · answered Aug 28 '20 at 16:19

Tree-sitter's syntax tree doesn't store copies of the input text. So to get the text of a particular token, you would have to use the ranges that Tree-sitter gives you to compute slices of your original source code.

In the python binding, this looks like this:

source_code_bytes = b'3 + 4'
tree = parser.parse(source_code_bytes)
node1 = tree.root_node.children[0].children[0]

node1_text = source_code_bytes[node1.start_byte:node1.end_byte].decode('utf8')
assert node1_text == '3'

In some language bindings, like the wasm binding, there is a .text helper for making this easier.

There is an open issue for adding this kind of helper function to the python binding.

How to get the values from nodes in tree-sitter?

1 Answers1