8

I am toying with the Python bindings of libclang. Currently, I am trying to perform some very simple tasks, such as finding all the headers included in a C++ file. The code I use is as follows:

from clang.cindex import Index

index = Index.create()
tu = index.parse("hello.cpp", args=["-std=c++14"])
for it in tu.get_includes():
    print(it.include.name)

The file hello.cpp is as follows:

#include <iostream>
#include <stdio.h>
#include "hello.h"

int main()
{
    std::cout << "Hello world\n";
}

And the file hello.h is as follows:

#include <list>

I thought that the code above would print iostream, stdio.h and hello.h, maybe list and maybe more if it took into account transitive includes. However, it only prints ./hello.h, blatantly ignoring the standard library headers.

I couldn't find anything in the documentation about whether it's by design or not. Is it by design? If so, is there any way to actually get all the headers incuded by a file with clang.cindex, including the standard library ones?

Morwenn
  • 21,684
  • 12
  • 93
  • 152
  • Maybe this will help: http://stackoverflow.com/questions/22482209/filtering-directories-when-parsing-cpp-files-in-get-includes-in-python-bindings – Simon Kraemer Oct 13 '15 at 13:39
  • @SimonKraemer Unfortunately not really. I found it before asking the question, but it only added to the confusion. Also, it seems that it use an older version of libclang with a subtly different API :/ – Morwenn Oct 13 '15 at 14:09
  • Have you tried to set the filter to your include directories? I'm sorry but I'm not familiar with these tools. The answer just looked as if the OP wanted exactly the opposite of yours. – Simon Kraemer Oct 13 '15 at 14:12
  • @SimonKraemer The filter thing is an addition by the OP. By reading libclang's code and documentation, it seems that there is no built-in filtering mechanism. – Morwenn Oct 13 '15 at 14:19

1 Answers1

4

For those who are still looking for an answer:

import sys
import os
from enum import Enum
from clang.cindex import Config, Index, CursorKind


Config.set_library_path(os.environ['CLANG_LIBRARY_PATH'])


# clang.cindex.TranslationUnit does not have all latest flags
# see: https://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#gab1e4965c1ebe8e41d71e90203a723fe9
CXTranslationUnit_None = 0x0
CXTranslationUnit_DetailedPreprocessingRecord = 0x01
CXTranslationUnit_Incomplete = 0x02
CXTranslationUnit_PrecompiledPreamble = 0x04
CXTranslationUnit_CacheCompletionResults = 0x08
CXTranslationUnit_ForSerialization = 0x10
CXTranslationUnit_CXXChainedPCH = 0x20
CXTranslationUnit_SkipFunctionBodies = 0x40
CXTranslationUnit_IncludeBriefCommentsInCodeCompletion = 0x80
CXTranslationUnit_CreatePreambleOnFirstParse = 0x100
CXTranslationUnit_KeepGoing = 0x200
CXTranslationUnit_SingleFileParse = 0x400
CXTranslationUnit_LimitSkipFunctionBodiesToPreamble = 0x800
CXTranslationUnit_IncludeAttributedTypes = 0x1000
CXTranslationUnit_VisitImplicitAttributes = 0x2000
CXTranslationUnit_IgnoreNonErrorsFromIncludedFiles = 0x4000
CXTranslationUnit_RetainExcludedConditionalBlocks = 0x8000


class IncludeForm(Enum):
    Quoted = 0
    AngleBracket = 1


class IncludeInfo:
    def __init__(self, path, form, file=None):
        self.path = path
        self.form = form
        self.file = file

    def __str__(self):
        open_bracket, close_bracket = ('<', '>') if self.form == IncludeForm.AngleBracket else ('"', '"')
        return f'#include {open_bracket}{self.path}{close_bracket} // {self.file}'


default_parser_options = (
    CXTranslationUnit_DetailedPreprocessingRecord |  # needed for preprocessing parsing
    CXTranslationUnit_SkipFunctionBodies |  # for faster parsing
    CXTranslationUnit_SingleFileParse |  # don't parse include files recursively
    CXTranslationUnit_RetainExcludedConditionalBlocks |  # keep includes inside ifdef blocks
    CXTranslationUnit_KeepGoing  # don't stop on errors
)


def create_include_parser(options=default_parser_options):
    def try_get_included_file(node):
        try:
            return node.get_included_file()
        except:
            return None

    def parse_includes(file, args=None):
        tu = index.parse(file, args=args, options=options)

        for node in tu.cursor.get_children():
            if node.kind == CursorKind.INCLUSION_DIRECTIVE:
                yield IncludeInfo(
                    node.displayname,
                    IncludeForm.AngleBracket if list(node.get_tokens())[-1].spelling == '>' else IncludeForm.Quoted,
                    try_get_included_file(node)
                )

    index = Index.create()

    return parse_includes


if __name__ == "__main__":
    parse_includes = create_include_parser()

    for file in sys.argv[1:]:
        for include_info in parse_includes(file):
            print(include_info)

For C++ file like this:

#include <iostream>
// #include <vector>

#include "foo.h"

#ifdef _BAR
#  include "bar.h"
#endif

#include "3rdparty/buzz.h"

int main() {
    std::cout << "Hello, World!" << std::endl;
}

It will print something like:

#include <iostream> // C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\include\iostream
#include "foo.h" // data/example/app/foo.h
#include "bar.h" // None
#include "3rdparty/buzz.h" // None

You can pass additional compiler options with args parameter, e.g. to pass additional include dirs:

for include_info in parse_includes(file, args=['-Idata/example']):
evg656e
  • 644
  • 7
  • 10