1

What are examples of non - context free languages in C language ? How the following non-CFL exists in C language ?

a) L1 = {wcw|w is {a,b}*}

b) L2 = {a^n b^m c^n d^m| n,m >=1}

akshaykumar6
  • 2,147
  • 4
  • 18
  • 31

2 Answers2

5

The question is clumsily worded, so I'm reading between the lines, here. Still, it's a common homework/study question.

The various ambiguities [1] in the C grammar as normally presented do not render the language non-context-free. (Indeed, they don't even render the grammars non-context-free.) The general rule "if it looks like a declaration, it's a declaration regardless of other possible parses" can probably be codified in a very complex context-free grammar (although it's not 100% obvious that that is true, since CFGs are not closed under intersection or difference), but it's easier to parse with a simpler CFG and then disambiguate according to the declaration rule.

Now, the important point about C (and most programming languages) is that the syntax of the language is quite a bit more complex than the BNF used for explanatory purposes. For example, a C program is not well-formed if a variable is used without being defined. That's a syntax error, but it's not detected by the CFG parser. The grammatical productions needed to define these cases are quite complicated, due to the complicated syntax of the language, but they're going to boil down to requiring that ids appear twice in a valid program. Hence L1 = {wcw|w is {a,b}+} (here w is the identifier, and c is way too complicated to spell out). In practice, checking this requirement is normally done with a symbol table, and the formal language rules, while precise, are not written in a logical formalism. Since L1is not a context-free language, the formalism could not be context-free, but a context-sensitive grammar can recognize L1, so it's not totally impossible. (See, for example, Algol 68.)

The symbol table is also used to decide whether a particular identifier is to be reduced to typedef-name [2]. This is required to resolve a number of ambiguities in the grammar. (It also further restricts the set of strings in the language, because there are some cases where an identifier must be resolved as a typedef-name in order for the program to be valid.)

For another type of context-sensitivity, function calls need to match function declarations in the number of arguments; this sort of requirement is modelled by L2 = {a^n b^m c^n d^m| n,m >=1} where a and c represent the definition and use of some function, and b and d represent the definition and use of a different function. (Again, in a highly-simplified form.)

This second requirement is possibly less clearly a syntactic requirement. Other languages (Python, for example) allow function calls with any number of arguments, and detect a argument/parameter count match as a semantic error only detected at runtime. In the case of C, however, a mismatch is clearly a syntax error.

In short, the set of grammatically valid strings which constitute the C language is a proper subset of the set of strings recognized by the CFG presented in the C language definition; the set of valid parses is a proper subset of the set of derivations generated by the CFG, and the language itself is (a) unambiguous, and (b) not context-free.

Note 1: Most of these are not really ambiguities, because they depend upon how a given identifier is resolved (typedef name, function identifier, declared variable,...).

Note 2: It is not the case that identifier must be resolved as a typedef-name if it happens to be one; that only happens in places where the reduction is possible. It is not a syntax error to use the same identifier for both a type and a variable, even in the same scope. (It's not a good idea, but it's valid.) The following example, adapted from an example in section 6.7.8 of the standard, shows the use of t as both a field name and a typedef:

typedef signed int t;
struct tag {
    unsigned t:4;  // field named 't' of type unsigned int
    const t:5;     // unnamed field of type 't' (signed int)
};
rici
  • 234,347
  • 28
  • 237
  • 341
  • So would it be accurate to say that the so-called C grammar formally presented in the standard is a CFG (since all of the productions are context-free BNF), albeit ambiguous. Then it's the "constraints" sections (maybe plus miscellaneous other rules) that use English to define the non-CFG subset that is the actual C language? – Steve Jessop Oct 23 '12 at 09:32
  • @SteveJessop: I'd go with that. Not all the constraints are in "constraints" sections; for example, 6.5.1(2): "An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator)" is in a "semantics" section although the accompanying note 91 clearly states that "an undeclared identifier is a violation of the *syntax*." The preprocessor introduces some additional complexities. – rici Oct 23 '12 at 14:46
3

These things aren't context-free in C:

foo * bar; // foo multiplied by bar or declaration of bar pointing to foo?
foo(*bar); // foo called with *bar as param or declaration of bar pointing to foo?
foo bar[2] // is bar an array of foo or a pointer to foo?
foo (bar baz) // is foo a function or a pointer to a function?
Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Ok! Thanks. Any example for the given languages ? ? – akshaykumar6 Oct 22 '12 at 14:15
  • What are the given languages? I think I may have misunderstood the question. Can you elaborate it? And what does it have to do with C? – Alexey Frunze Oct 22 '12 at 14:17
  • I mean the examples that belongs to non-CFL L1 or L2 in C ? – akshaykumar6 Oct 22 '12 at 14:18
  • Aren't these examples of ambiguities rather than demonstrating the language is not context-free? – Andy Hayden Oct 22 '12 at 14:18
  • @developer Can you provide a more clear example of what you're looking for? – Alexey Frunze Oct 22 '12 at 14:20
  • @hayden These aren't ambiguous in context in C. They are ambiguous out of context. So, it looks like this demonstrates that C is not context-free. – Alexey Frunze Oct 22 '12 at 14:22
  • I want the expressions which belong to these non-CFL languages(L1 and L2).One example that I think of L1 language is "a+=a" but i am not sure about this.No Idea about L2. – akshaykumar6 Oct 22 '12 at 14:28
  • 1
    @AlexeyFrunze: A grammar is context free even if it is ambiguous, and a language is context free if there exists a context free grammar which derives exactly the set of all strings in the language. The context sensitivity of C is demonstrated not by ambiguities but by non-accepted strings. I try to explain this in my answer. – rici Oct 23 '12 at 14:49