I have tried parsing files using #include
by Python. I have tried to match pattern using sed
command. Both these ways I get garbage data. For example, if in some comment I have /* #include "header.h" */
I get those lines as well. How to avoid this?

- 730,956
- 141
- 904
- 1,278

- 39
- 1
- 11
-
1can you post code what ever you tried? – Jayesh Bhoi Sep 10 '14 at 05:54
-
1do you have any code that u have tried out so far !! – DOOM Sep 10 '14 at 05:54
-
I tried : sed -nr \'/#include/p\' file.c – Aabha Geed Sep 10 '14 at 06:02
6 Answers
GCC supports the -H
option. Consider the source file hw.c
:
#include <stdio.h>
int main(void) { puts("Hello world"); return 0; }
On Mac OS X 10.9.4 with GCC 4.8.1:
$ gcc -H -c hw.c
. /usr/include/stdio.h
.. /usr/include/sys/cdefs.h
... /usr/include/sys/_symbol_aliasing.h
... /usr/include/sys/_posix_availability.h
.. /usr/include/Availability.h
... /usr/include/AvailabilityInternal.h
.. /usr/include/_types.h
... /usr/include/sys/_types.h
.... /usr/include/machine/_types.h
..... /usr/include/i386/_types.h
.. /usr/include/sys/_types/_va_list.h
.. /usr/include/sys/_types/_size_t.h
.. /usr/include/sys/_types/_null.h
.. /usr/include/sys/_types/_off_t.h
.. /usr/include/sys/_types/_ssize_t.h
.. /usr/include/secure/_stdio.h
... /usr/include/secure/_common.h
Multiple include guards may be useful for:
/usr/include/secure/_stdio.h
/usr/include/sys/_posix_availability.h
/usr/include/sys/_symbol_aliasing.h
$

- 730,956
- 141
- 904
- 1,278
Once you start thinking about non-trivial cases like
/* #include <header.h> */
you'll soon reach the point where it is no longer really practical to write your own dependency extractor.
Consider for example these:
#define PLUGIN "my_extension.h"
#include PLUGIN
#ifdef WITH_CURSES
# include <curses.h>
#endif
You can continue the list indefinitely. If you want to handle all these correctly, you'll end up implementing a full preprocessor.
I don't know what you want to do with the generated list of flies but a common situation is to determine on which files a compilation unit depends, for example to generate makefiles. Most compilers have included special support for this. In GCC, it is the -M
option.
main.c
#include <alpha.h>
/* #include <beta.h> */
#ifdef PLUGIN
#include PLUGIN
#endif
#if WITH_DELTA
#include <delta.h>
#endif
alpha.h
#include <epsilon.h>
Let beta.h, gamma.h, delta.h and epsilon.h be empty (or, at least, not #include
anything).
$ gcc -I. -M main.c
main.o: main.c /usr/include/stdc-predef.h alpha.h epsilon.h
$ gcc -I. -DPLUGIN='<gamma.h>' -M main.c
main.o: main.c /usr/include/stdc-predef.h alpha.h epsilon.h gamma.h
$ gcc -I. -DWITH_DELTA=1 -M main.c
main.o: main.c /usr/include/stdc-predef.h alpha.h epsilon.h delta.h
Even if you are not ultimately trying to generate a makefile, parsing the preprocessor's output will be a lot easier than walking your own way through the source files.

- 24,280
- 5
- 45
- 92
-
I tried using -M option of GCC. if the headers are not in same directory as the one with source file the i get error: No such file or directory – Aabha Geed Sep 10 '14 at 06:32
-
You must add the respective directories to the include path via the `-I` option, just as if you were actually compiling. – 5gon12eder Sep 10 '14 at 06:33
-
If your compiler supports the -E
(or similar) option, something like this may be useful:
cc -E myprogram.c | grep '^# 1 '
The -E
option says just run the preprocessing stage and show the results.
An advantage of this method is that you can include any important -I
and -D
command line options as you would for a normal compile, thus capturing any behavior change those might engender.

- 9,176
- 6
- 48
- 72
-
This will work on gcc and clang, but you could use `-MD` or `-H` to better effect. – rici Sep 10 '14 at 14:59
Have you considered using something like pycparser that parses C files? It may be overkill for your question, but it does allow much more advanced parsing options.

- 46,082
- 8
- 107
- 198
-
The re module serches for all the matches of #includes. The problem is when a comment is multi line I am not able to filter the multi line comments if I get garbage value for ex. /*-------------- – Aabha Geed Sep 10 '14 at 06:09
-
You could use the grep utility (Linux, MacOS X):
grep '/^\s*#/' my_file.c
or (for a multi-file search)
grep '/^\s*#/' *.c

- 13,718
- 29
- 42
-
-
Grep doesn't understand `\s` (unless you're using gnu grep, and even then you need to specify `-P` to enable Perl-style regexen). Use `[[:space:]]` instead. But anyway, this could still fail if the supposed `#include` is inside of a `/*` comment. – rici Sep 10 '14 at 14:55
I use re module with Match and Search functions. Search will find the text anywere in the string while match starts from the beginning of the string

- 619
- 6
- 14
-
The re module serches for all the matches of #includes. The problem is when a comment is multi line I am not able to filter the multi line comments if I get garbage value for ex. /*-------------- #include stuff ---------*/ – Aabha Geed Sep 10 '14 at 06:10
-
you are right, I was thinking of matching per line. without parsing the entire file how do you think to do it? – aivision2020 Sep 10 '14 at 11:34
-