I'm using Clang to create some internal static code analyzers. For one of the analyzers, we need to take a raw string and check if it has any syntax errors. We shouldn't consider missing symbols, missing headers, invalid function calls etc. as invalid syntax - as the only meaning is to see if it's a valid C/C++ code or not.
I thought initially that I could do it with ASTUnit
:
auto AST = tooling::buildASTFromCodeWithArgs(MyCode,
Args,
"input.cc",
"clang-tool",
std::make_shared<PCHContainerOperations>(),
tooling::getClangStripDependencyFileAdjuster(),
tooling::FileContentMappings(),
&DiagConsumer);
llvm::outs() << "hasUncompilableErrorOccurred " << AST->getDiagnostics().hasUncompilableErrorOccurred() << "\n";
llvm::outs() << "hasUnrecoverableErrorOccurred " << AST->getDiagnostics().hasUnrecoverableErrorOccurred() << "\n";
llvm::outs() << "hasErrorOccurred " << AST->getDiagnostics().hasErrorOccurred() << "\n";
Taking two inputs: Hello world
and #include <undefined.h>
- both yields 1 in the outputs above - even when #include <undefined.h>
is a correct C statement, but the issue with it (unlike with hello world
, which's not a valid C code) - is that undefined.h
is missing. Similarly, taking: int* p = malloc(sizeof(int));
as code will yield error in all of these calls if stdlib.h
wasn't included.
I try to avoid such errors, so that every case, except from hello world
, will be considered as valid code.
I did tried to iterate over it by creating a Raw Lexer, but it won't give me sufficient information.
Lexer Lex(CharRange.getBegin(), PP->getLangOpts(), Text.data(),
Text.data(), Text.data() + Text.size());
Token RawTok;
do {
Lex.LexFromRawLexer(RawTok);
llvm::outs() << "\t- " << RawTok.getKind() << "\n";
} while (RawTok.isNot(tok::eof));
I'd love to get any suggestions!