I'm working on a project that relies on parsing .cpp files using into the clang AST.
I've noticed that (at least the python bindings) leave out a lot of relevant nodes that appear when using
-clang -Xclang -ast-dump
Python parser:
def print_traversal(cursor, depth=0):
print(depth*" ", end='')
print(cursor.kind.name)
for child in cursor.get_children():
print_traversal(child, depth+1)
def processSource(source):
index = clang.cindex.Index.create()
with open("temp/temp.cpp", "w") as outfile:
outfile.write(source)
tu=index.parse("temp/temp.cpp", args=["-fno-delayed-template-parsing"])
root = tu.cursor
print_traversal(root)
Is there any way to get the AST parser to output slightly more verbose information? There are also plenty of cases where the parsed AST skips nodes.
Side question: Is there any way to get the relevant tokens for an individual node, (and not the entire subtree like with cursor.get_tokens()
? Or any other way to extract relevant information from AST nodes?
I tried running the parser on the following input:
#include <iostream>
using namespace std;
int main(){
int a=0, b=2;
cout<<a+b;
return 0;
}
Output:
TRANSLATION_UNIT
USING_DIRECTIVE
NAMESPACE_REF
FUNCTION_DECL
COMPOUND_STMT
DECL_STMT
VAR_DECL
INTEGER_LITERAL
VAR_DECL
INTEGER_LITERAL
RETURN_STMT
INTEGER_LITERAL
Relevant part of clang AST:
...
`-FunctionDecl 0x2306730 <line:4:1, line:8:1> line:4:5 main 'int ()'
`-CompoundStmt 0x230d8d8 <col:11, line:8:1>
|-DeclStmt 0x2306928 <line:5:3, col:15>
| |-VarDecl 0x23067e8 <col:3, col:9> col:7 used a 'int' cinit
| | `-IntegerLiteral 0x2306850 <col:9> 'int' 0
| `-VarDecl 0x2306888 <col:3, col:14> col:12 used b 'int' cinit
| `-IntegerLiteral 0x23068f0 <col:14> 'int' 2
|-CXXOperatorCallExpr 0x230d870 <line:6:3, col:11> 'std::basic_ostream<char, std::char_traits<char> >::__ostream_type':'std::basic_ostream<char>' lvalue
| |-ImplicitCastExpr 0x230d858 <col:7> 'std::basic_ostream<char, std::char_traits<char> >::__ostream_type &(*)(int)' <FunctionToPointerDecay>
| | `-DeclRefExpr 0x230d7e0 <col:7> 'std::basic_ostream<char, std::char_traits<char> >::__ostream_type &(int)' lvalue CXXMethod 0x226db98 'operator<<' 'std::basic_ostream<char, std::char_traits<char> >::__ostream_type &(int)'
| |-DeclRefExpr 0x2306940 <col:3> 'std::ostream':'std::basic_ostream<char>' lvalue Var 0x2306258 'cout' 'std::ostream':'std::basic_ostream<char>'
| `-BinaryOperator 0x23069d0 <col:9, col:11> 'int' '+'
| |-ImplicitCastExpr 0x23069a0 <col:9> 'int' <LValueToRValue>
| | `-DeclRefExpr 0x2306960 <col:9> 'int' lvalue Var 0x23067e8 'a' 'int'
| `-ImplicitCastExpr 0x23069b8 <col:11> 'int' <LValueToRValue>
| `-DeclRefExpr 0x2306980 <col:11> 'int' lvalue Var 0x2306888 'b' 'int'
`-ReturnStmt 0x230d8c8 <line:7:3, col:10>
`-IntegerLiteral 0x230d8a8 <col:10> 'int' 0
Besides not providing information about data types (not even in spelling), there is also no information about the cout.