0

I am working on a project using pycparser for parsing C source code.

Accordingly with https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html when I run preprocessor I have # linenum filename flags linemarkers in my preprocessed translation unit.

However, when I parse the output of gcc -E using pycparser the tokens embed the coordinates (file, line, column) but they seem not to include any information from the linemarkers' flag that would be very useful to me.

Any solution or advice to include also linemarkers in my AST or embed the information in the token of the AST?


UPDATE What i need is to pass through my tokens and understand the file they belong to (and this is already in pycparser) but also how this file has been included.

This information are in the flag field of the lienmarkers introduced by the preprocessor. Indeed if i have:

<tokens of file1.h>
<tokens of file2.h>
<tokens of main.c>

the inclusion of file2.h could have been either in file1.h or in main.c. I need to extract this info using pycparser. I know i can use gcc -H and do a lot of analysis and processing etc to get rid of it. However, flag element of the linemarks reports if i am opening a or returning from a file, so it includes the info about nested inclusion. Is this info somewhere in pycparser? Can it be simply added somehow?

1 Answers1

0

pycparser understands #line directives and incorporates them into the coordinates it tracks for all tokens. For example, consider this file:

int a;

int b;

We can dump its AST:

$ python examples/dump_ast.py --coord /tmp/file.c
FileAST:  (at None)
  Decl: a, [], [], [] (at /tmp/file.c:1:5)
    TypeDecl: a, [] (at /tmp/file.c:1:5)
      IdentifierType: [u'int'] (at /tmp/file.c:1:1)
  Decl: b, [], [], [] (at /tmp/file.c:3:5)
    TypeDecl: b, [] (at /tmp/file.c:3:5)
      IdentifierType: [u'int'] (at /tmp/file.c:3:1)

Note that the declaration of b has the location /tmp/file.c:3:5, meaning line 3 (and column 5) of the file.

Now modify the file slightly to be:

int a;

#line 90
int b;

And dump AST again:

$ python examples/dump_ast.py --coord /tmp/file.c
FileAST:  (at None)
  Decl: a, [], [], [] (at /tmp/file.c:1:5)
    TypeDecl: a, [] (at /tmp/file.c:1:5)
      IdentifierType: [u'int'] (at /tmp/file.c:1:1)
  Decl: b, [], [], [] (at /tmp/file.c:90:5)
    TypeDecl: b, [] (at /tmp/file.c:90:5)
      IdentifierType: [u'int'] (at /tmp/file.c:90:1)

See what happened to the location of b?

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • Hello Eli, thankyou so much for your pycparser. The line directive is really interesting thankyou for the insight. But what i need is to track down the hierarchy inlcusion of each token. I try to update my question to make it clearer. – SeishunNoArkadia May 18 '21 at 08:16
  • @SeishunNoArkadia: since pycparser works on preprocessed source, I don't think it has access to this information. This is something only the preprocessor knows. – Eli Bendersky May 18 '21 at 12:07
  • 1
    the linemarker are included by the preprocessor, so they are indeed part of preprocessed source that pycparse will parse e.g, the results of gcc -E command (I guesS) . – SeishunNoArkadia May 18 '21 at 14:24
  • @SeishunNoArkadia: it would be more effective if you could open an issue for pycparser, with a minimal example that demonstrates the issue end-to-end – Eli Bendersky May 18 '21 at 14:25