-4

I am writing a script that can process .c and .h files. Using regular expressions I am finding all functions within a given file. During my experiences with C I always defined functions in the following manner:

void foo(int a){
//random code
}

Is it possible to declare a function in the following manner:

void
foo(int a){
//random code
}

I always assumed that the function type, name, and parameters needed to be in the same line, but I've been told otherwise so I'm not exactly sure.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Walter Kovacs
  • 57
  • 2
  • 3
  • 7
    This is easily searched for but, to answer your question, C considers a line-break (new-line) to be just white space so any c statement, etc., can be over multiple lines or multiple statements may be in one line. Your entire C program could be in one line if desired! – DoxyLover Aug 20 '14 at 16:35
  • 2
    I think I may have worded this question wrong. I have no questions about the Python script and it is working fine. This question is purely about the rules of C. I probably shouldn't have even mentioned the Python script since it is going to just confuse people. – Walter Kovacs Aug 20 '14 at 16:36
  • don't assume that. you will see this coding style in automatically generated JNI .h files. – Jason Hu Aug 20 '14 at 16:37
  • 3
    Why don't you use a specific parsing library? – Alessandro Suglia Aug 20 '14 at 16:37
  • You could leverage on some existing compiler. Recent [GCC](http://gcc.gnu.org/) can be quite easily customized with [MELT](http://gcc-melt.org/) – Basile Starynkevitch Aug 20 '14 at 16:43
  • @AlessandroSuglia I'm new to Python and don't know about a lot of the libraries. Can you suggest one? – Walter Kovacs Aug 20 '14 at 17:03

6 Answers6

6

Firstly, what kind of whitespace - space, newline, tab etc. - you use in C source code does not matter, as long as there's whitespace where whitespace is required. Also, it does not matter how much whitespace you use.

Secondly, taking into account C preprocessor capabilities, one can write function declarations (and the rest of the code) as

vo\
id f\
o\
o(i\
n\
t\
 \
a)

(Obviously, there are many more different ways in which preprocessor can obfuscate a function definition. For your particular task, it would be a better idea to work on already preprocessed source code.)

Thirdly, C still supports K&R-style function definitions that look as follows

void foo(a)
int a;
{
  ...
}
AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
5

C is a free-form language; white space is not significant, in general, except to separate tokens. (There are caveats to that assertion, notably in preprocessor directives, and inside string literals and character literals, but in general that is accurate.) Thus, the following is a ghastly but legitimate C function definition:

/* Comment before the type */ SomeUserDefinedTypeName
/??/
* comments, with trigraphs to boot
*??/
/
FunctionName

(

SomeType param1,

AnotherType

(
*
param2
)
[
]

)

/\
/ one line comment
// another line comment \
yes, this is part \
of that one-line comment too

{
   ...
}

Of course, anyone who produces a function like that deserves to be hung, drawn and quartered — or, at least, severely castigated — but you will have to decide on how general-purpose you want your code to be. If it needs to work with any C whatsoever, you will need to handle c**p1 like this. On the other hand, you can probably get away with a lot less sophisticated parsing.

1 There's an A and an R missing, and I'm not talking about fish.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
4

Don't try to parse C with regexes

This is a valid C function, named test, which takes a const pointer to void (named ptr) and returns a pointer to a function that takes an array of five pointer to functions which return an int and returns an unsigned int.

unsigned int (*(test)(const void *ptr)) (int (*[5])()) 
{
    return 0;
} 

(bonus points if someone can find a real-world scenario where this thing could have any use)

Although deprecated, you may also come in contact with the "old style" function notation:

// declaration
unsigned int test2();

// definition
unsigned int test2(ptr)
const void *ptr;
{
    return 0;
} 

Intermixed in this you can find comments (both multi-line and single-line since C99), trigraphs and even macros:

#define defun(fn) fn (
#define fstart ){
#define fend } 

void defun(test3) int a, double b
fstart
    printf("%d %f", a, b);
fend

http://ideone.com/JDDeMr

Even excluding the pathological macro scenario, "plain" regexes cannot even start to parse this thing because they can't match parentheses; maybe you can do something with extended regexes, but let's be honest, do you really want to cope with this stuff? Use a ready-made parser or even a compiler (libclang comes to mind) and let it do the dirty work.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
1

I think that for a beginner user writing from zero a code that uses regex in order to parse a source code, is quite difficult but it could be quite inefficient too.

As I've stated before, I suggest to use a well written library like pyparsing that will let you translate the BNF notation of the language to the specific object of the library.

After you have defined a parsing element written using the pyparsing API, you can easily parse a simple string or a complex file using the library too. In a first moment could be a bit difficult, but I think that you can easily use it with great results.

I suggest you to have a look to this simple C grammar defined using the pyparsing library. It's very well written and documented.

Alessandro Suglia
  • 1,907
  • 1
  • 16
  • 23
0

So, whitespace includes characters like tabs, newlines, and spaces (among others).

In general, these whitespace characters are interchangeable. That is, you could replace every space with a newline (or vice-versa), and the compiler wouldn't care.

There are a few places where newlines are treated specially. Some that come to mind include the preprocessor, string literals, character literals, and single line comments.

With the two examples that you've shown, both are parsed identically. Additionally, we could also write it as:

void
  foo (
    int
a
) { //random code
}

or:

void
foo
(
int
a
)
{
//random code
}

or:

void foo(int a){ /* random code */ }
Bill Lynch
  • 80,138
  • 16
  • 128
  • 173
0

Both of these are correct (will compile) because the C compiler will ignore whitespace(s) between the return type and function-name. The format for function definition is usually:

<return type> <function name> (<parameter list>) {
       <body>
}

During compilation the return type and function-name are separate tokens, the parser will ignore the whitespace(s) between them. Hope this helps.

GHe
  • 499
  • 1
  • 4
  • 10
  • I don't think "ignore" is the correct word. If a compiler *ignores* something, it acts as if it saw nothing (for example, the BOM at the start of a C source saved as UTF8). – Jongware Aug 20 '14 at 17:07