1

I'm using Clang to create some internal static code analyzers. For one of the analyzers, we need to take a raw string and check if it has any syntax errors. We shouldn't consider missing symbols, missing headers, invalid function calls etc. as invalid syntax - as the only meaning is to see if it's a valid C/C++ code or not.

I thought initially that I could do it with ASTUnit:

  auto AST = tooling::buildASTFromCodeWithArgs(MyCode,
                                               Args,
                                               "input.cc",
                                               "clang-tool",
                                               std::make_shared<PCHContainerOperations>(),
                                               tooling::getClangStripDependencyFileAdjuster(),
                                               tooling::FileContentMappings(),
                                               &DiagConsumer);

  llvm::outs() << "hasUncompilableErrorOccurred " << AST->getDiagnostics().hasUncompilableErrorOccurred() << "\n";
  llvm::outs() << "hasUnrecoverableErrorOccurred " << AST->getDiagnostics().hasUnrecoverableErrorOccurred() << "\n";
  llvm::outs() << "hasErrorOccurred " << AST->getDiagnostics().hasErrorOccurred() << "\n";

Taking two inputs: Hello world and #include <undefined.h> - both yields 1 in the outputs above - even when #include <undefined.h> is a correct C statement, but the issue with it (unlike with hello world, which's not a valid C code) - is that undefined.h is missing. Similarly, taking: int* p = malloc(sizeof(int)); as code will yield error in all of these calls if stdlib.h wasn't included.

I try to avoid such errors, so that every case, except from hello world, will be considered as valid code.

I did tried to iterate over it by creating a Raw Lexer, but it won't give me sufficient information.

  Lexer Lex(CharRange.getBegin(), PP->getLangOpts(), Text.data(),
                       Text.data(), Text.data() + Text.size());

  Token RawTok;
  do {
    Lex.LexFromRawLexer(RawTok);
    llvm::outs() << "\t- " << RawTok.getKind() << "\n";
  } while (RawTok.isNot(tok::eof));

I'd love to get any suggestions!

OzB
  • 2,140
  • 1
  • 22
  • 38

0 Answers0