2

I am interested in how the compiler parses certain declarations and why does it allow them in the first place.

Consider the following shred of code: int(var);

which most likely nobody would ever write, but it appears to be allowed by GCC C99 for some reason (a language extension?). To me, knowing exactly how the declarations are parsed, this makes sense albeit useless and an obvious bad practice in the slightest.

The problem is that this introduces a new difficulty in parsing declarations as such:

int func ( int(var), int(void) );

Here, since identifiers in argument declarations are optional, the compiler must do what exactly? Attempt to retrieve 'void' as an identifier, but instead of producing an error as usual, allow it since it matches the syntax of a function (void) returning int? This presents a challenge when you would want to parse such a declaration and I want to make sure I am doing it correctly.

It starts to make less sense and become more confusing with declarations like this one:

void c (int())

Because the argument will be treated as a function returning int instead of

void c (int);

Please, shed some light on the subject.

Edenia
  • 2,312
  • 1
  • 16
  • 33
  • `(var)` is a valid declarator per section 6.7.6 of the specification. So no, this is not a GCC extension. – user3386109 Jun 07 '23 at 18:25
  • That's more worrisome then, I need to behave in accordance to the standard. Cannot get away with it as if it was implementation-defined / undefined behavior. – Edenia Jun 07 '23 at 18:27

2 Answers2

2

According to the C grammar (6.7.6 Declarators)

Syntax
    1 declarator:
        pointeropt direct-declarator
    direct-declarator:
        identifier
        ( declarator )
    //...

declarators may be enclosed in parentheses.

For example this array declaration

int a[M][N];

may be rewritten like

int ( ( ( a )[M] )[N] );

As for this function declaration

int func ( int(var), int(void) );

then there are declared two parameters: one named parameter of the type int and other unnamed parameter of the function type int ( void ).

In C in function declarations that are not at the same time their definitions you may omit identifiers of parameters.

As for this declaration

void c (int());

then again the parameter has a function type.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • 1
    I thought `int func ( int(var), int(void) );` would take an `int` and an `int(*)(void)` – Ted Lyngmo Jun 07 '23 at 18:29
  • Check out: https://onlinegdb.com/wS7OjU0J0 doesn't even issue a warning with -Wall and -Wextra nor error – Edenia Jun 07 '23 at 18:30
  • @Edenia It's because it's interpreted as taking a function: [example](https://godbolt.org/z/vxYnTMane) – Ted Lyngmo Jun 07 '23 at 18:31
  • @Edenia I am sorry. I was not attentive,:) – Vlad from Moscow Jun 07 '23 at 18:33
  • It's okay haha, nobody can be as confused as I am currently (regarding the design approach) so I am gaining useful help (: – Edenia Jun 07 '23 at 18:36
  • So at the end of the day, how does a parser should treat the second parameter, because it goes right into the place where identifiers are to be parsed? – Edenia Jun 07 '23 at 18:38
  • 1
    @Edenia As soon as an open parenthesis is encountered and there is an identifier or a pointer symbol within the parentheses the compiler considers enclosed entities as a declarator. Otherwise the record is parsed as a function declaration. – Vlad from Moscow Jun 07 '23 at 18:47
  • For int(void) there isn't an identifier or a '*'. A parser should correctly differentiate those two completely different declarations `int func ( int(var), int(void) ); ` I asked if that would mean checking if 'void' is not an identifier and in such case parse `(void)` i.e a function rather than parse `()` – Edenia Jun 07 '23 at 19:05
  • 1
    @Edenia `int(some_type)` is interpreted as a function signature where the function takes `some_type` as an argument and returns `int`. `int(double)` would be the signature of a function taking a `double` and returning `int`, so yes, the compiler needs to look at what's inside the parentheses. – Ted Lyngmo Jun 07 '23 at 20:00
  • 1
    @TedLyngmo Thanks for clarifying. I expected that. It's the only instance where the contents of the parentheses matters in order to differentiate two declarations (one with an identifier and one without) and it is a bit inefficient only for the sake of a grammar rule that allows it solely for the sake of grammar logic. – Edenia Jun 07 '23 at 20:07
1

Parentheses are allowed in declarations because they are needed to create certain compound types, such as:

int *x[3];    // An array of 3 pointers to `int`.
int (*x)[3];  // A pointer to an array of 3 `int`.

Using int (x) is merely using the allowed grammar in a way that is needless.1

This presents a challenge when you would want to parse such a declaration and I want to make sure I am doing it correctly.

The C grammar is generally not parsed by studying the grammar rules and manually writing code for it. Parsing theory, generally taught in university computer science curricula, studies formal descriptions of programming languages and mathematical transformations of a formal language grammar into software that parses the grammar. There are software tools to perform this transformation.

Yacc is one such tool, often paired with Lex to do lexical analysis. With Yacc, you would write grammar rules, much as they appear in the C standard, and accompany them with source code to process the parsed tokens, and the Yacc tool would transform the rules into parsing source code.

Footnote

1 Parenthesizing identifiers in simple declarators is not entirely useless. One use is in writing Stack Overflow questions for a certain date in the spring of the northern hemisphere.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Too late for me to study parsing theory in an university and rewriting a project :( – Edenia Jun 07 '23 at 18:34
  • 1
    Haha on the footnote. At first you nearly caused me a heart attack, since I reject takiing into consideration digraphs and trigraphs and anything of that nature. Then I learned from the answer is that the code snippet was practically obfuscated for no reason other than obfuscation. – Edenia Jun 07 '23 at 18:44