6

I know that C is not a context-free language, a famous example is:

int foo;
typedef int foo;
foo x;

In this case the lexer doesn't know, whether foo in the 3rd line, is an identifier, or typedef.

My question is, is this the only reason that makes C a Context-Sensitive Language?

I mean, if we get rid of typedef, would it become context-free language? Or there are other reasons (examples) that prevent it from being so?

Yahya
  • 13,349
  • 6
  • 30
  • 42
Bite Bytes
  • 1,455
  • 8
  • 24
  • If you remove such things, you no longer have C, but a completely new language that is *similar* to (but not compatible with) C. – Some programmer dude May 23 '17 at 17:47
  • `+` depends on context: in `40 + 2` it means integer addition; in `40 + 2.0` it means floating point addition. – pmg May 23 '17 at 17:50
  • 3
    @pmg The parser doesn't care what the types of things are. – sepp2k May 23 '17 at 17:51
  • @sepp2k On the other hand `+` (and `-`) are context-sensitive since they can both mean two different operators depending on context: Unary or binary plus/minus. So from the lexers point of view it's impossible to say which operator is meant. – Some programmer dude May 23 '17 at 17:53
  • 4
    @BiteBytes The example doesn't really work. If `foo` weren't a typedef, `foo x;` would simply be an error, so there's no syntactic ambiguity there. I mean sure, the lexer can't tell whether `foo` is an identifier or a typename, but the example doesn't explain why it would need to know that (in fact the lexer doesn't necessarily have to know that - that's just one possible way of solving the problem). The common example would be the expression `(foo)(x)`, which could either be a function call or a cast, depending on whether `foo` is a type or not. – sepp2k May 23 '17 at 17:54
  • 1
    @Someprogrammerdude That doesn't depend on context. I can easily parse unary and binary operators with a context free grammar. – sepp2k May 23 '17 at 17:55
  • @Someprogrammerdude Still, any construct involving these operators can be described using CFG – Eugene Sh. May 23 '17 at 17:55
  • @sepp2k True, but on the other hand the OP seems more worried about it from a lexical point of view (which doesn't really involve the grammar). Some clarification from the OP would be nice... – Some programmer dude May 23 '17 at 17:59
  • 2
    @Someprogrammerdude "OP seems more worried about it from a lexical point of view" The distinction between unary and infix operators matters even less on a lexical level. A plus sign is just a plus sign. Also the term "context free" doesn't really have a meaning at the lexical level, so I see no way to interpret the question like that. The way I see it, the OP talked about the lexer because the cast-vs.-function-call ambiguity is often solved with a lexer hack that distinguishes typenames from other identifiers, not because the question is about lexing per se. – sepp2k May 23 '17 at 18:06
  • 2
    I don't think the words you've used accurately reflect the question you really mean to ask. Nothing in the formal definition of a context-free language requires that every valid sentence in the language have a unique production from terminal symbols. You seem to be looking for constructs whose interpretation in terms of grammar rules is unambiguous, but the existence of such constructs does not make the language context-sensitive. – John Bollinger May 23 '17 at 18:14
  • @JohnBollinger: It's true (and important) that a CFG can be ambiguous. But the question is whether C is described by a CFG, and for that we need to be clear on what we mean by "C". For example, `int foo; int bar = (foo)(3);` is not valid C, while `typedef int foo; int bar = (foo)(3);` is valid. There is an argument (which I don't totally agree with) that we can ignore that error as "semantic", but there is no doubt that the C standard forbids the first excerpt, so that the best we can say is that the (potential) CFG describes a *superset* of C. (Ignoring the preprocessor.) ... – rici May 24 '17 at 15:50
  • @JohnBollinger: ... that ends up with the claim that C is (kind of) context free because a superset of C can be described with a CFG. What is the superset? Why, it is exactly the smallest language which includes C and is context-free (because we've eliminated all the "semantic" errors which are nonetheless inherent in the program text). The fact that the smallest context-free superset of C is context-free is tautological. But is it interesting? (Perhaps I should edit that pair of comments into an answer, but I feel like I've written it all before.) – rici May 24 '17 at 15:52

3 Answers3

4

The post-processor C syntax can be parsed with a classical lex + yacc combo. The lexer definition and the yacc grammar are freely available at

http://www.quut.com/c/ANSI-C-grammar-l-2011.html and http://www.quut.com/c/ANSI-C-grammar-y-2011.html

As you can see from the lex file, it's straightforward except for the context-sensitive check_type() (and comment(), but comment processing technically belongs to the preprocessor), which makes typedef the only source of context-sensitivity there. Since the yacc file doesn't contain any context-sensitivity introducing tricks either, a typedef-less C would be a perfectly context-free syntax.

The subsequent typechecking of C (matching declarations with use sites) is context sensitive, so you could say that overall, C is context sensitive.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
2

No. C cannot be a strict context independent language. For that, you should describe a syntax that doesn't allow to use a nondeclared variable (this is context) in a similar way as what you describe in your question. The language authors always describe syntax using some kind of context free grammar, but just to describe the main syntactic constructs of the language. The case you describe (making a type identifier to fit in a different token class to be able to go in places where it shouldn't) is only an example. If you look for example, the freedom in the order for things like static unsigned long long int variable simplifies the syntax remembering by programmers, but complicates things to the compiler authors.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
1

As per my knowledge and research there are two basic reasons that make C context sensitive language. These are:

  1. Variable is declared before it is used.
  2. Matching the formal and actual parameters of functions or sub-routines.

These two can't be done by PushDown Automata (PDA) but Linear Bounded Automata (LBA) can do thes two.

Tanvi Agarwal
  • 156
  • 3
  • 10