0

I'm currently trying to parse a C header file AND retain the name of some macro-defined values.

As Tim notes in the comments below the preprocessor does its job and ultimately derives that raw value that will be used in the code. However, I was hoping to accomplish both generating the AST of the header file AND extracting, or retaining easy access to the name of the macro-defined values.

All of that to say, and ask, is there any way of utilizing pycparser to extract raw macro defined values, or is this out of the scope or not how the tool was intended to be used?

My code is rather simple, and as mentioned earlier, is outputting most of what I would expect to be there minus macro defined values.

ast = parse_file(filename, use_cpp=True,
        cpp_path=CC, # CC = gcc
        cpp_args=[
            '-E',
            r'-I/path/to/fake_libc_include',
            r'-I/other/includes'
            ]
        )
ast.show()

Say for example I make a main_file.c, and I include the header I want to parse.

#include <target_header.h>


int main() {
  int i = foobar; // #define foobar 0x3

  return 0;
}

I then do the same process of parsing the C file rather than header file using pycparser. I will get the following:

FuncDef: 
Decl: main, [], [], [], []
  FuncDecl: 
    TypeDecl: main, [], None
      IdentifierType: ['int']
Compound: 
  Decl: i, [], [], [], []
    TypeDecl: i, [], None
      IdentifierType: ['int']
    Constant: int, 0x0003
  Return: 
    Constant: int, 0

So the information of the macro value of the defined macro and the preprocessor only cares about the value, as expected. Ultimately I was hoping for a helper function from pycparser that does "pre-preprocessor" lifting so to speak of the actual name of the macro-define values but I think I might be running into a wall of pycparser not being built for that purpose.

Perhaps just using a separate approach or tool such as the one listed here might be the best bet but let me know if anyone has done something similar with pycparsrer so as to avoid using more than one library:

Use C preprocessor macros (just the constants) as Python variables

UPDATE: this question is based on a required capability that is outside of the scope of pycparser but I am keeping this question and answering using my approach in case anyone else runs into the same need.

afterShock
  • 33
  • 1
  • 7
  • 1
    Right. That's how C works. The compiler never sees the word "foobar". The preprocessor runs as a first pass and filters the code to do all of the #define substitutions. By the time it gets compiled, that statement is literally `int i = 3;`. Perhaps you should try doing your parse without `use_cpp`. – Tim Roberts Mar 23 '23 at 17:30
  • I did think of that as well. I guess I was running into a chicken and egg scenario because I was wanting the resultant AST of the header file WHILE also having easy access to the raw name of the macro defined values, should've made that more clear. But I'll give that a stab thanks! – afterShock Mar 23 '23 at 17:40
  • 1
    `pycparser` does not preserve preprocessor macros - it is designed to run *after* the preprocessor. As you found, there's many hacks/workarounds you can employ, but this is outside of the scope of the tool – Eli Bendersky Mar 24 '23 at 14:42

1 Answers1

0

I was rather hard-headed about only using pycparser and ended up doing a hacky approach to get what I wanted done.

I'm posting this as answer in case anyone else comes across this post with a similar need.

To summarize, I needed a way to both generate the AST that pycparser gives while still retaining the name and value of ALL macro-defined variables in the header file.

To accomplish this, I first take into consideration the header I will be analyzing, say, foobar.h.

I open the file and manually parse out all #define'd variables with obvious exceptions such as functions and the header guard. I then programatically generate a dummy_file.c and write all headers as basic ints so I have the preprocessor's evaluated value of all macro variables, as follows:

#include <foobar.h>

int main() {

   int def_foo = FOO;
   int def_bar = BAR;
   // etc.

    return 0;
}

In the end I run parse_file on the dummy_file.c and get both the original AST that was generated by running parse_file on foobar.h AND the names and values of all macro-defined variables.

For those who are curious as to why I wouldnt just re-define the #defines as ints within the header is mostly out of treating the header as a "ground truth" so that I can generate the respective headers API and values in python as well. I'm expecting this header to file to change over time as well.

afterShock
  • 33
  • 1
  • 7