1

I have a C header file. I want to parse it and extract information about data types, functions and functions arguments. Who can help me? I need some example in C.

Thank you very much.

Vinicius Kamakura
  • 7,665
  • 1
  • 29
  • 43
user828975
  • 27
  • 1
  • 3

6 Answers6

3

Use ANTLR. There's a decent grammar for C already written for you, and ANTLR will generate C code (or some other languages if you prefer), which you can then traverse to get what you want.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • +1 for parser generators generally, I've never used ANTLR, but Lex + Yacc (or equivalently: Flex + Bison) have been useful to me – tobyodavies Jul 13 '11 at 01:42
  • I've used ANTLR (v2 though, not v3), in production, in two or three projects. It's very good. – John Zwinck Jul 13 '11 at 01:44
  • 2
    A **parser** will not extract type information. You need symbol table support implemented for that. As a practical matter, you have to run the preprocessor, too. So an ANTLR grammar by itself won't help you. There may be an ANTLR grammar which parses and build symbol tables; I'm not specificially familiar with one. – Ira Baxter Jul 13 '11 at 04:04
  • ... the specific C grammar referenced in this answer says it only keeps track of type definition names, that is, it does not do symbol table construction, let alone type analysis. – Ira Baxter Jul 13 '11 at 04:18
  • @AlexWebr: Perl won't produce a usable result without enormous amounts of work. You still have to parse and produce a symbol table. – Ira Baxter Jul 13 '11 at 04:31
  • 1
    @RadLexus: Thanks for the note--I've fixed the link. – John Zwinck Jan 23 '17 at 03:04
3

You could try Clang. In special The Lexer and Preprocessor Library.

Vinicius Kamakura
  • 7,665
  • 1
  • 29
  • 43
  • 1
    Recently, the most important and useful feature of Clang, namely its XML printer, was removed for some weird reasons. But it is still possible to either use and older version of Clang or to re-apply that removed patch to the current version. – SK-logic Jul 13 '11 at 07:23
  • @SK-logic Why was the XML printer the most important and useful feature? It was removed because using it was usually slower than the direct API calls (you're parsing one file, generating an AST, re-parsing the AST in a language, XML, which is slow to parse, and *then* operating on it). – kirbyfan64sos Mar 25 '15 at 19:40
  • 1
    @kirbyfan64sos, because a usable `libclang` did not exist back then. – SK-logic Mar 28 '15 at 00:27
1

There is also srcml. Similar to c2xml it uses source code directly. c2xml starts from preprocessor output. Assume good C coding rules (as opposed to arbitrary use of preprocessing) this has been an advantage for my re-engineering tasks, as it preserves the names of #defines and being able to process selected macros in a specific way.

ngong
  • 754
  • 1
  • 8
  • 23
0

If you need a human-readable output (e.g. in html or PDF), then you can use doxygene/doxywizard. In doxywizard "All entities" has to be selected.

0

The DMS Software Reengineering Toolkit with its C Front End can do this.

DMS provides general purpose parsing, symbol table construction, flow analysis, and program transformations, parameterized by a language definition. Using DMS's C front end, DMS will parse any of a variety of C dialects, builds ASTs for the code elements, builds full symbol tables doing complete name and type resolution of all symbols (including parameter lists in function headers); you can stop there and dump those out. DMS can also do control and data flow analysis on the C code; you can use othe DMS facilities to further analyze or transform the code. (The C front end has a full C preprocessor built-in).

The EDG front end can also be used for parsing and symbol tables, but does not have the other capabilities of DMS.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
0

Yet another option is to use the c2xml tool from "sparse". Its C parser isn't 100% standard-compliant (e.g. it won't parse K&R-style declarations), but for reasonably modern C code it works quite well.