1

There is something I do not understand about the lineno offset that's being calculated by the ast module. Usually when I get the lineno of some ast object, it gives me the first line the object is encountered.

For example in the below case, the foo's lin

st='def foo():\n    print "hello"'
import ast
print ast.parse(st).body[0].lineno 
print ast.parse(st).body[0].body[0].lineno

would return 1 for function foo and return 2 for the hello world printout

However, if I parse a multi-line docstring (ast.Expr) the lineno provided is the last line.

st='def foo():\n    """\n    Test\n    """'   
import ast
print ast.parse(st).body[0].lineno 
print ast.parse(st).body[0].body[0].lineno

The result would still be line 1 for the function but it would be line 4 for the docstring. I would have expected it to be on line 2 since that is when the docstring begins.

I guess what I am asking is if there is a way to always get the first lineno of all ast objects including ast.Expr .

karpet22
  • 51
  • 3
  • I'm pretty sure it _doesn't_ usually give you the first line, but rather the last physical line that contains part of the first "virtual line" (after backslash, bracket, and triple-quote continuation are taken into account). – abarnert May 27 '15 at 00:12
  • In other words, the same rule that causes you to see the last line of a multi-line expression in an exception traceback. – abarnert May 27 '15 at 00:12
  • But I doubt this is documented anywhere (in the docs it just says it's "the line of the source text"). If you feel like digging through the CPython 2.7 source, it should be in `ast.c` where it sets the `n_lineno` (possibly through a macro `#define`d in one of the headers) for each node type. Or that may be too late; you may have to look in the generated code that creates the CST as input to the AST stuff. – abarnert May 27 '15 at 00:18

1 Answers1

0

AST's source locations leave much to be desired, but a lot of that is made available by the ASTTokens library, which annotates AST nodes with more useful location info. In your example:

import asttokens
st='def foo():\n    """\n    Test\n    """'
atok = asttokens.ASTTokens(st, parse=True)

print atok.tree.body[0].first_token.start[0]
print atok.tree.body[0].body[0].first_token.start[0]

Prints 1 and 2, as desired. Perhaps more interestingly,

print atok.get_text_range(atok.tree.body[0])
print atok.get_text_range(atok.tree.body[0].body[0])

Prints the ranges of source text corresponding to the nodes: (0,35) and (15,35) in this case.

DS.
  • 22,632
  • 6
  • 47
  • 54