116

I have this simple program:

#include <stdio.h>

struct S
{
    int i;
};

void swap(struct S *a, struct S *b)
{
    struct S temp;
    temp = *a    /* Oops, missing a semicolon here... */
    *a = *b;
    *b = temp;
}

int main(void)
{
    struct S a = { 1 };
    struct S b = { 2 };

    swap(&a, &b);
}

As seen on e.g. ideone.com this gives an error:

prog.c: In function 'swap':
prog.c:12:5: error: invalid operands to binary * (have 'struct S' and 'struct S *')
     *a = *b;
     ^

Why doesn't the compiler detect the missing semicolon?


Note: This question and its answer is motivated by this question. While there are other questions similar to this, I didn't find anything mentioning the free-form capacity of the C language which is what is causing this and related errors.

double-beep
  • 5,031
  • 17
  • 33
  • 41
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 16
    What motivated this post? – R Sahu Oct 19 '16 at 15:17
  • 3
    This question and the one that inspired it are almost exactly the same, no? How is this not a duplicate? – Tavian Barnes Oct 19 '16 at 17:34
  • 10
    @TavianBarnes Discoverability. The other question is not discoverable when searching for this kind of issue. It could be edited that way, but that would require changing a little to much, making it a whole different question IMO. – Some programmer dude Oct 19 '16 at 17:36
  • 4
    @TavianBarnes: The original question was asking for the error. This question is asking why the compiler seems (to the OP at least) to be misreporting the location of the error. – TonyK Oct 19 '16 at 18:29
  • 86
    Point to ponder: if a compiler could systematically detect missing semi-colons, the *language* wouldn't need semi-colons to begin with. – Euro Micelli Oct 20 '16 at 02:15
  • 5
    The compilers job is to report the error. It's your job to figure out what to change to fix the error. – David Schwartz Oct 20 '16 at 10:55
  • @EuroMicelli Oh, you mean like T-SQL? Though even there, I believe omitting semicolons is *deprecated*. – user Oct 20 '16 at 12:04
  • 3
    @EuroMicelli JavaScript is an instructive example of why you should not even try... – Jared Smith Oct 20 '16 at 14:24
  • 1
    Why can _you_ say that the semicolon is missing? – Thorbjørn Ravn Andersen Oct 20 '16 at 14:39
  • @ThorbjørnRavnAndersen I can say it's missing because I can see the *context*, something humans are very good at seeing but computers and compilers are very bad at. – Some programmer dude Oct 20 '16 at 14:47
  • 1
    @Someprogrammerdude in this particular case you can see what was most likely _intended_ which is not necessarily (you for all purposes debug the program in your mind9. I invite you to examine programs in IOCCC, for instance http://www.ioccc.org/1986/wall/wall.c, to see how C programs may look like. – Thorbjørn Ravn Andersen Oct 20 '16 at 14:53
  • @ThorbjørnRavnAndersen Well *deliberate* obfuscation of course makes detecting context hard. Even unintended obfuscation does it, from e.g. beginners writing code before they learned about indentation and such. The free form of C can be both a blessing and a curse. And yes, I've seen plenty of IOCCC entries, sometimes wondering how such code could be both so beautiful and ugly at the same time. :) What I'm trying to say that it's *usually* easier for us humans to see context and such problems than for the compiler. – Some programmer dude Oct 20 '16 at 15:00
  • 1
    And this is why language design is hard. C is portable assembler designed around 1970. Question is if any newer languages are easier to write bug-free programs in? Haskell? – Thorbjørn Ravn Andersen Oct 20 '16 at 15:27
  • @JaredSmith JavaScript is a example of a language that don't needs semi-colons, as whitespace is relevant for it: `a = () => {return a}` and `a = () => {return \n a}`. Using `return \n a;` will not fix that error, it will be still interpreted as `return; \n a;`. – Gustavo Rodrigues Oct 20 '16 at 17:36
  • @GustavoRodrigues my point was that ASI *sounds* like a good idea: "we'll just have the parser insert one any time it encounters what would otherwise be a syntax error" so that if someone forgets one NBD. The problem is that (as you point out) semi-colons can change program *meaning*. A perhaps better example would be an IIFE with no terminating semi-colon on the preceding line being treated as a double invocation on the last expression of that line. – Jared Smith Oct 20 '16 at 17:52
  • @JaredSmith Simultaneously, Python is an example of doing it well; but JavaScript's real problem is having so many corner cases in its syntax. Even [Visual]Basic manages to get this right. (Not to mention Haskell, which has optional semi-colons, and is less restrictive than Python on whitespace placement.) – jpaugh Oct 21 '16 at 03:08
  • 2
    I bet If I asked such a question, I would be downvoted for asking obvious things ... – Buksy Oct 26 '16 at 06:36
  • @Buksy Definitely! He is getting >70 up votes? What is going on? Just because of his reputation? – Peter VARGA Oct 26 '16 at 12:44
  • @Buksy It's not the question alone, but the package with the question and its answer (for which I don't get any reputation). And as stated in my question and earlier comments there's really no (discoverable) similar question. Furthermore, the number of views this have had since I posted the question/answer pair is way higher than any of the two linked questions, meaning this is something many people actually search for and would not have found otherwise. You would have gotten the same treatment if you written it. – Some programmer dude Oct 26 '16 at 12:58
  • This question was in the weekly newsletter - this is the **only** one reason why it has so many views. – Peter VARGA Oct 26 '16 at 15:30
  • The compiler should be clever (and friendly) enough to check the previous line for a missing semi colon in the case of the subsequent line being in error, it should then report this in the error message imho. For example "error: invalid operands to binary, by the way did you happen to miss a semi colon on line 11?" – ejectamenta Mar 21 '17 at 15:35
  • Actually sometimes the compiler does, eg. Error C2143 syntax error: missing ';' before '}' https://msdn.microsoft.com/query/dev14.query?appId=Dev14IDEF1&l=EN-US&k=k(C2143)&rd=true – ejectamenta Mar 21 '17 at 15:42

5 Answers5

226

C is a free-form language. That means you could format it in many ways and it will still be a legal program.

For example a statement like

a = b * c;

could be written like

a=b*c;

or like

a
=
b
*
c
;

So when the compiler see the lines

temp = *a
*a = *b;

it thinks it means

temp = *a * a = *b;

That is of course not a valid expression and the compiler will complain about that instead of the missing semicolon. The reason it's not valid is because a is a pointer to a structure, so *a * a is trying to multiply a structure instance (*a) with a pointer to a structure (a).

While the compiler can't detect the missing semicolon, it also reports the totally unrelated error on the wrong line. This is important to notice because no matter how much you look at the line where the error is reported, there is no error there. Sometimes problems like this will need you to look at previous lines to see if they are okay and without errors.

Sometimes you even have to look in another file to find the error. For example if a header file is defining a structure the last it does in the header file, and the semicolon terminating the structure is missing, then the error will not be in the header file but in the file that includes the header file.

And sometimes it gets even worse: if you include two (or more) header files, and the first one contains an incomplete declaration, most probably the syntax error will be indicated in the second header file.


Related to this is the concept of follow-up errors. Some errors, typically due to missing semicolons actually, are reported as multiple errors. This is why it's important to start from the top when fixing errors, as fixing the first error might make multiple errors disappear.

This of course can lead to fixing one error at a time and frequent recompiles which can be cumbersome with large projects. Recognizing such follow-up errors is something that comes with experience though, and after seeing them a few times it's easier to dig out the real errors and fix more than one error per recompile.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 18
    In C++, `temp = *a * a = *b` *could* be a valid expression if `operator*` were overloaded. (The question is tagged as “C”, though.) – dan04 Oct 19 '16 at 21:42
  • 14
    @dan04: If someone actually did that... NOPE! – Kevin Oct 19 '16 at 22:52
  • 2
    +1 for the advice about (a) starting with the first reported error; and (b) looking backwards from where the error is reported. You know you're a _real_ programmer when you automatically look on the line before where an error is reported :-) – TripeHound Oct 20 '16 at 11:51
  • @TripeHound ESPECIALLY when there are a very large number of errors, or lines that previously compiled are throwing errors... – Tin Wizard Oct 20 '16 at 23:18
  • In C++ you can overload almost anything but it really isn't a good idea to do so on the standard things unless you can be sure that it will not blow up a library or confuse some poor developer in the future. – TafT Oct 21 '16 at 08:16
  • @dan04 Even overloaded `*` keeps its precedence, and `*a * a` is not an lvalue... – Déjà vu Oct 29 '16 at 02:12
  • I'd argue, that the compiler didn't report an unrelated error, or that the developer would have to backtrack, or guess. Remember, the error message was *"invalid operands to binary \*"*. The compiler marked the \* operator, and since it is a **binary** operator, you know that you have to look to the left of the operator. There is no guessing involved here, really. – IInspectable Feb 09 '17 at 14:58
  • Interesting that the post doesn't show your updated user name. That's probably a bug. – StoryTeller - Unslander Monica Sep 03 '18 at 09:03
  • @StoryTeller Might be, you should probably ask about it on meta. :) – Some programmer dude Sep 03 '18 at 09:07
  • 1
    As is usually the case with meta, someone already asked - https://meta.stackoverflow.com/questions/266663/community-wiki-posts-dont-show-change-in-username – StoryTeller - Unslander Monica Sep 03 '18 at 09:08
  • is there any vs code extension so that compiler it selves add semicolons for c language? like prettier does for javascript – Rohan Devaki Jun 13 '21 at 06:38
  • @RohanDevaki Unfortunately that's not really possible since it's extremely hard to inject missing semicolons at the right place. All you can hope for is warnings about *posdible* places. And copious use of static analyzers and other validators. Decent editors with knowledge about context also help. – Some programmer dude Jun 13 '21 at 20:40
30

Why doesn't the compiler detect the missing semicolon?

There are three things to remember.

  1. Line endings in C are just ordinary whitespace.
  2. * in C can be both a unary and a binary operator. As a unary operator it means "dereference", as a binary operator it means "multiply".
  3. The difference between unary and binary operators is determined from the context in which they are seen.

The result of these two facts is when we parse.

 temp = *a    /* Oops, missing a semicolon here... */
 *a = *b;

The first and last * are interpreted as unary but the second * is interpreted as binary. From a syntax perspective, this looks OK.

It is only after parsing when the compiler tries to interpret the operators in the context of their operand types that an error is seen.

user
  • 6,897
  • 8
  • 43
  • 79
plugwash
  • 9,724
  • 2
  • 38
  • 51
4

Some good answers above, but I will elaborate.

temp = *a *a = *b;

This is actually a case of x = y = z; where both x and y are assigned the value of z.

What you are saying is the contents of address (a times a) become equal to the contents of b, as does temp.

In short, *a *a = <any integer value> is a valid statement. As previously pointed out, the first * dereferences a pointer, while the second multiplies two values.

Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551
  • 3
    The dereference takes priority, so it's (contents of address a) times (pointer to a). You can tell, because the compile error says "invalid operands to binary * (have 'struct S' and 'struct S *')" which are those two types. – dascandy Oct 20 '16 at 08:56
  • I code pre C99, so no bools :-) But you do make a good point (+1), although order of assignment wasn't really the point of my answer – Mawg says reinstate Monica Oct 21 '16 at 08:51
  • 1
    But in this case, `y` isn't even a variable, it's the expression `*a *a`, and you can't assign to the result of a multiplication. – Barmar Oct 25 '16 at 18:42
  • @Barmar indeed but the compiler doesn't get that far, it has already decided that the operands to the "binary *" are invalid before it looks at the assignment operator. – plugwash Sep 01 '17 at 12:32
4

There's a Polish movie titled "Nic Śmiesznego" ("Nothing Funny"). Here's an excerpt of relevant dialogue from a scene that shows exactly why the compiler developers may be a bit shy to proclaim such missing semicolons with reckless abandon.

Director: What do you mean "this one"?! Are you saying that this object is in my field of view? Point it out with your finger, because I want to believe I'm dreaming.

Adam: This, right here (points).

Director: This? What is this?!

Adam: What do you mean? It's a forest.

Director: Can you tell me why the bloody hell would I need a forest?

Adam: How come "bloody hell"? Here, in the screenplay, it says a forest, it says...

Director: In the screenplay? Find it in this screenplay for me.

Adam: Here: (reads) "When they came upon the crest of the road, in front of them appeared a forest"

Director: Flip the page.

Adam: Oh crap...

Director: Read it for me.

Adam: in front of them appeared a forest... of headstones.

See, it's not generally possible to tell in advance that you really meant a forest and not a forest of headstones.

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
3

Most compilers parse source files in order, and report the line where they discover that something was wrong. The first 12 lines of your C program could be the start of a valid (error-free) C program. The first 13 lines of your program cannot. Some compilers will note the location of things they encounter which are not errors in and of themselves, and in most cases won't trigger errors later in the code, but might not be valid in combination with something else. For example:

int foo;
...
float foo;

The declaration int foo; by itself would be perfectly fine. Likewise the declaration float foo;. Some compilers may record the line number where the first declaration appeared, and associate an informational message with that line, to help the programmer identify cases where the earlier definition is actually the erroneous one. Compilers may also keep the line numbers associated with something like a do, which can be reported if the associated while does not appear in the right place. For cases where the likely location of the problem would be immediately preceding the line where the error is discovered, however, compilers generally don't bother adding an extra report for the position.

supercat
  • 77,689
  • 9
  • 166
  • 211