0

I'm working on a project that relies on parsing .cpp files using into the clang AST.

I've noticed that (at least the python bindings) leave out a lot of relevant nodes that appear when using -clang -Xclang -ast-dump

Python parser:

def print_traversal(cursor, depth=0):
  print(depth*"    ", end='')
  print(cursor.kind.name)
  for child in cursor.get_children():
    print_traversal(child, depth+1)


def processSource(source):
  index = clang.cindex.Index.create()
  with open("temp/temp.cpp", "w") as outfile:
    outfile.write(source)
  tu=index.parse("temp/temp.cpp", args=["-fno-delayed-template-parsing"])
  root = tu.cursor
  print_traversal(root)

Is there any way to get the AST parser to output slightly more verbose information? There are also plenty of cases where the parsed AST skips nodes.

Side question: Is there any way to get the relevant tokens for an individual node, (and not the entire subtree like with cursor.get_tokens()? Or any other way to extract relevant information from AST nodes?

I tried running the parser on the following input:

#include <iostream>
using namespace std;

int main(){
  int a=0, b=2;
  cout<<a+b;
  return 0;
}

Output:

TRANSLATION_UNIT
    USING_DIRECTIVE
        NAMESPACE_REF
    FUNCTION_DECL
        COMPOUND_STMT
            DECL_STMT
                VAR_DECL
                    INTEGER_LITERAL
                VAR_DECL
                    INTEGER_LITERAL
            RETURN_STMT
                INTEGER_LITERAL

Relevant part of clang AST:

...
`-FunctionDecl 0x2306730 <line:4:1, line:8:1> line:4:5 main 'int ()'
  `-CompoundStmt 0x230d8d8 <col:11, line:8:1>
    |-DeclStmt 0x2306928 <line:5:3, col:15>
    | |-VarDecl 0x23067e8 <col:3, col:9> col:7 used a 'int' cinit
    | | `-IntegerLiteral 0x2306850 <col:9> 'int' 0
    | `-VarDecl 0x2306888 <col:3, col:14> col:12 used b 'int' cinit
    |   `-IntegerLiteral 0x23068f0 <col:14> 'int' 2
    |-CXXOperatorCallExpr 0x230d870 <line:6:3, col:11> 'std::basic_ostream<char, std::char_traits<char> >::__ostream_type':'std::basic_ostream<char>' lvalue
    | |-ImplicitCastExpr 0x230d858 <col:7> 'std::basic_ostream<char, std::char_traits<char> >::__ostream_type &(*)(int)' <FunctionToPointerDecay>
    | | `-DeclRefExpr 0x230d7e0 <col:7> 'std::basic_ostream<char, std::char_traits<char> >::__ostream_type &(int)' lvalue CXXMethod 0x226db98 'operator<<' 'std::basic_ostream<char, std::char_traits<char> >::__ostream_type &(int)'
    | |-DeclRefExpr 0x2306940 <col:3> 'std::ostream':'std::basic_ostream<char>' lvalue Var 0x2306258 'cout' 'std::ostream':'std::basic_ostream<char>'
    | `-BinaryOperator 0x23069d0 <col:9, col:11> 'int' '+'
    |   |-ImplicitCastExpr 0x23069a0 <col:9> 'int' <LValueToRValue>
    |   | `-DeclRefExpr 0x2306960 <col:9> 'int' lvalue Var 0x23067e8 'a' 'int'
    |   `-ImplicitCastExpr 0x23069b8 <col:11> 'int' <LValueToRValue>
    |     `-DeclRefExpr 0x2306980 <col:11> 'int' lvalue Var 0x2306888 'b' 'int'
    `-ReturnStmt 0x230d8c8 <line:7:3, col:10>
      `-IntegerLiteral 0x230d8a8 <col:10> 'int' 0

Besides not providing information about data types (not even in spelling), there is also no information about the cout.

0 Answers0