0

I'm trying to parse a single C++ file that looks as follows:

#include <memory>
#include <string>
#include "foo.h"

std::unique_ptr<wchar_t[]> FooBar::baz(std::wstring const& text)
{
    auto result = std::make_unique<wchar_t[]>(text.length() + 1);
    std::copy(text.begin(), text.end(), result.get());
    result.get()[text.length()] = 0;
    return result;
}

int main(int argc, char* argv[])
{
    return 0;
}

To parse the file, I use libclang:

#include "clang-c/Index.h"
#include <iostream>

CXChildVisitResult visitor(CXCursor cursor, CXCursor, CXClientData) {
    CXCursorKind kind = clang_getCursorKind(cursor);

    // Consider functions and methods
    if (kind == CXCursorKind::CXCursor_FunctionDecl ||
        kind == CXCursorKind::CXCursor_CXXMethod) {
        auto cursorName = clang_getCursorDisplayName(cursor);
        std::cout << "Found Function: " << clang_getCString(cursorName) << std::endl;
        clang_disposeString(cursorName);
    }

    return CXChildVisit_Recurse;
}

int main(int argc, char* argv[])
{
    CXIndex index = clang_createIndex(
        /* excludeDeclsFromPCH */1,
        /* displayDiagnostics=*/1
    );

    constexpr const char* defaultArguments[] = {
    "-std=c++17",
    "-ferror-limit=0",
    };

    CXTranslationUnit TU = clang_parseTranslationUnit(
        index, 
        "C:\\PATH\\TO\\test.cpp",
        /*command_line_args=*/defaultArguments,
        /*num_command_line_args=*/std::extent<decltype(defaultArguments)>::value,
        /*unsaved_files=*/nullptr,
        /*num_unsaved_files=*/0,
        CXTranslationUnit_SingleFileParse | CXTranslationUnit_KeepGoing
    );

    CXCursor cursor = clang_getTranslationUnitCursor(TU);
    clang_visitChildren(
        cursor,
        visitor,
        nullptr);

    clang_disposeTranslationUnit(TU);
    clang_disposeIndex(index);
    return 0;
}

The output is:

test.cpp:3:10: error: 'foo.h' file not found
test.cpp:5:1: error: use of undeclared identifier 'std'
test.cpp:5:28: error: use of undeclared identifier 'FooBar'
test.cpp:5:40: error: use of undeclared identifier 'std'
Found Function: main(int, char **)

All pre-processor errors are fine and expected, my problem is that FooBar::baz() is not recognized as a function.

I intentionally haven't provided the include directory of foo.h as a compiler flag (-I) because I want this tool to be standalone, i. e., operate on arbitrary single C++ source files, to extract the function names. In the Clang API docs I read that CXTranslationUnit_SingleFileParse is specifically designed for this use case, but somehow it doesn't provide the expected results.

What am I missing?

Disclaimer: I'm aware that it is unconventional to try to force an actual compiler to ignore syntax errors due to unresolved includes, but ctags or tree-sitter do not give satisfying results, as they are only fuzzy parsers.

DEls
  • 241
  • 3
  • 14
  • A Translation Unit is usually understood to be the single file generated after source file is processed by the pre-processor including all `#include`s, `#define`s etc. – Richard Critten May 31 '22 at 17:42
  • Not sure how that answers the question. I'm aware that the pre-processor will fail including all `#include`s, but I try to still get the syntax information about which function definitions exist. – DEls May 31 '22 at 17:45
  • My point is if the pre-processor stage fails, then there is no Translation Unit to compile. That a compiler continues is just the compiler trying to be helpful and guess at what the code could mean with it only having partial / incomplete information. – Richard Critten May 31 '22 at 17:47
  • Actually there is, otherwise the main function would not be correctly parsed. I assume clang will throw away those parts of the translation unit where diagnostics were found. Since `FooBar` is not declared inside the file itself (but in `foo.h`), `FooBar::baz` is ignored. – DEls May 31 '22 at 17:49

0 Answers0