0

I have tried parsing files using #include by Python. I have tried to match pattern using sed command. Both these ways I get garbage data. For example, if in some comment I have /* #include "header.h" */ I get those lines as well. How to avoid this?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Aabha Geed
  • 39
  • 1
  • 11

6 Answers6

7

GCC supports the -H option. Consider the source file hw.c:

#include <stdio.h>
int main(void) { puts("Hello world"); return 0; }

On Mac OS X 10.9.4 with GCC 4.8.1:

$ gcc -H -c hw.c
. /usr/include/stdio.h
.. /usr/include/sys/cdefs.h
... /usr/include/sys/_symbol_aliasing.h
... /usr/include/sys/_posix_availability.h
.. /usr/include/Availability.h
... /usr/include/AvailabilityInternal.h
.. /usr/include/_types.h
... /usr/include/sys/_types.h
.... /usr/include/machine/_types.h
..... /usr/include/i386/_types.h
.. /usr/include/sys/_types/_va_list.h
.. /usr/include/sys/_types/_size_t.h
.. /usr/include/sys/_types/_null.h
.. /usr/include/sys/_types/_off_t.h
.. /usr/include/sys/_types/_ssize_t.h
.. /usr/include/secure/_stdio.h
... /usr/include/secure/_common.h
Multiple include guards may be useful for:
/usr/include/secure/_stdio.h
/usr/include/sys/_posix_availability.h
/usr/include/sys/_symbol_aliasing.h
$
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
3

Once you start thinking about non-trivial cases like

/* #include <header.h> */

you'll soon reach the point where it is no longer really practical to write your own dependency extractor.

Consider for example these:

#define PLUGIN "my_extension.h"
#include PLUGIN

#ifdef WITH_CURSES
#  include <curses.h>
#endif

You can continue the list indefinitely. If you want to handle all these correctly, you'll end up implementing a full preprocessor.

I don't know what you want to do with the generated list of flies but a common situation is to determine on which files a compilation unit depends, for example to generate makefiles. Most compilers have included special support for this. In GCC, it is the -M option.

main.c

#include <alpha.h>

/* #include <beta.h> */

#ifdef PLUGIN
#include PLUGIN
#endif

#if WITH_DELTA
#include <delta.h>
#endif

alpha.h

#include <epsilon.h>

Let beta.h, gamma.h, delta.h and epsilon.h be empty (or, at least, not #include anything).

$ gcc -I. -M main.c
main.o: main.c /usr/include/stdc-predef.h alpha.h epsilon.h

$ gcc -I. -DPLUGIN='<gamma.h>' -M main.c
main.o: main.c /usr/include/stdc-predef.h alpha.h epsilon.h gamma.h

$ gcc -I. -DWITH_DELTA=1 -M main.c
main.o: main.c /usr/include/stdc-predef.h alpha.h epsilon.h delta.h

Even if you are not ultimately trying to generate a makefile, parsing the preprocessor's output will be a lot easier than walking your own way through the source files.

5gon12eder
  • 24,280
  • 5
  • 45
  • 92
0

If your compiler supports the -E (or similar) option, something like this may be useful:

cc -E myprogram.c | grep '^# 1 '

The -E option says just run the preprocessing stage and show the results.

An advantage of this method is that you can include any important -I and -D command line options as you would for a normal compile, thus capturing any behavior change those might engender.

John Hascall
  • 9,176
  • 6
  • 48
  • 72
0

Have you considered using something like pycparser that parses C files? It may be overkill for your question, but it does allow much more advanced parsing options.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • The re module serches for all the matches of #includes. The problem is when a comment is multi line I am not able to filter the multi line comments if I get garbage value for ex. /*-------------- – Aabha Geed Sep 10 '14 at 06:09
  • Can you tell me how do i use pycparser? – Aabha Geed Sep 10 '14 at 12:00
-1

You could use the grep utility (Linux, MacOS X):

grep '/^\s*#/' my_file.c

or (for a multi-file search)

grep '/^\s*#/' *.c
Miguel Prz
  • 13,718
  • 29
  • 42
  • This doesn't seem to work at all... – John Hascall Sep 10 '14 at 06:04
  • Grep doesn't understand `\s` (unless you're using gnu grep, and even then you need to specify `-P` to enable Perl-style regexen). Use `[[:space:]]` instead. But anyway, this could still fail if the supposed `#include` is inside of a `/*` comment. – rici Sep 10 '14 at 14:55
-1

I use re module with Match and Search functions. Search will find the text anywere in the string while match starts from the beginning of the string

aivision2020
  • 619
  • 6
  • 14
  • The re module serches for all the matches of #includes. The problem is when a comment is multi line I am not able to filter the multi line comments if I get garbage value for ex. /*-------------- #include stuff ---------*/ – Aabha Geed Sep 10 '14 at 06:10
  • you are right, I was thinking of matching per line. without parsing the entire file how do you think to do it? – aivision2020 Sep 10 '14 at 11:34
  • Ill need to use some tool of compiler or any open source parser – Aabha Geed Sep 10 '14 at 11:48