Why distinction between expression and statement

Question

In lots (actually all I've ever used) of functional languages there is no distinction between a statement and an expression and the last value of each code block is the "return value" of the block. On the other hand languages not generally considered purely functional usually introduce this distinction.

As an example of what I'm talking about, the following python code prints None:

def foo():
    5 + 5
print(foo())

while the scheme code prints 10

(define (foo) (+ 5 5))
(display (foo))

Obviously I'm not interested in subjective answers of people who prefer one style to the other, but objective reasons.

To me it seems the distinction makes the grammar and implementation of the language more complicated (one less obvious example of this being the necessary exceptions in the c++ standard for templates and void types, or the introduction of "shortcut if statements", like ? in c-influenced languages) without a real benefit - but most likely there's a reason why even new, modern languages still have this distinction.

Hmm the proposed duplicate doesn't really answer my question: Yes I know what expressions/statements are and I already mentioned some problems that are introduced by the distinction - which is where the answer to the other question stops. I'm interested why even new languages today whose designers surely are aware of these problems still continue to introduce the distinction. The best I can come up with was "familiarity for users coming from c-like languages", but then people who don't want to use the "feature" wouldn't be harmed by its existence even if they didn't know it existed (afais). — Voo, Dec 31 '12 at 06:18

score 5 · Accepted Answer · answered Dec 31 '12 at 12:43

Ubiquitous side effects.

If you are in a purely functional language, everything is an expression. Even "statements" which return something like () (possibly distinguished by their type, e.g. IO ().

However, the majority of programming languages by default permit effects anywhere or everywhere, so sequencing becomes key, and thus you bake in special syntax for ordering statements to the computer, often separated with semicolons.

This isn't the case for pure expressions, which can be evaluated in any order that preserves the expression semantics.

Side effecting actions are considered such special expressions that they get special syntax.

score 4 · Answer 2 · answered Dec 31 '12 at 06:40

First, let me say that I think you're asking two, maybe more, different questions: "Why are some expressions distinguished syntactically from others?" and "Why are the semantics for sequencing what they are?"

For your first question: The sense I get from the many things I've read is that statements are expressions, but a restricted class of expressions that cannot appear as subexpressions in all circumstances, e.g.,

x = 4
y = (x += 1)

The above python code will generate a syntax error because a statement appeared in a place where an (unrestricted) expression was expected. I associate statements with side-effects, with sequencing, and with the imperative style. I don't know if you consider programming style a subjective answer to your question (style itself certainly is subjective).

I'm very interested to hear others' takes on this question, too.

For the second question: Semantics are sometimes arbitrarily decided, but the aim is a reasonable semantics, with different language designers simply differing on what is most reasonable (or most expected). It surprised me to learn that if control reaches the end of a function body in Python, it returns None, but those are the semantics. Designers have to answer similar semantics questions like "What should the type of a while loop be?" and "What should the type of an if statement be if it doesn't have an else branch? and Where should such statements be allowed syntactically (issues can arise if such an if statement is the last statement in a sequence of statements)?"

Generally an expression has a value, while a statement does not. This seems a pretty important distinction, so considering one of them a subset of the other (statements are an expression with a specific return type that cannot be used for anything?) seems not that useful to me. What the type of a while loop should be? Well the value of the last expression - if there are several possible code paths type inference should work (although now we assume an unified type system which would make problems for say java and be a performance problem.. yep I can see that point) — Voo, Dec 31 '12 at 06:52
cont. So at least for statically typed languages I can see some problems (if there's no unified type system some constructs couldn't have a sensible return type), still doesn't explain why e.g. python needs the distinction. — Voo, Dec 31 '12 at 06:54
@Voo, if a while loop has the type of the last expression/statement in its body, what value does it return if it is never entered? Also, typically theory says that _all_ expressions evaluate to a value, even statements, but the value for statements in such a system will typically be a value of a singleton type, e.g., unit or void, a type that represents values that safely can be discarded. In terms of "Why these semantics in Python?", why not ask the designer(s)? — BlueBomber, Dec 31 '12 at 18:09

Gregor Ophey · Answer 3 · 2013-01-01T22:06:48.827

The question is, "why do new languages still have statements and not expressions exclusively?", right?

Programming language designs address different problems, e.g.

simple grammar,
simple implementation,
simple semantics

being among the more theoretical design goals and

execution speed of resulting compiled code
compilation speed
resource consumption of executing programs
ease of use (e.g. simple to read)

being among the more practical ones ...

These design goals have no clear cut definitions, e.g. a short grammar is not necessarily the one with the cleanest structure, so which one is simpler?

(considering your example)

For ease of use or code readability a language designer might require you to write 'return', in front of the value (or rather the expression) resulting from a function. This is a return statement. If you can leave out the 'return', it is still implied and it could still be considered as a return statement (it just would not be so obvious in the code). If it is considered as an expression, this implies substitution semantics, like e.g. Scheme, but probably not Python. From a syntactical stand point it makes sense to distinguish statements and expressions, where 'return' is required.

Looking at machine code (which I didn't do much, so I might be wrong) it seems to me there are only statements, no expressions.

E.g. your example:

ld r1, 5
ld r2, 5
add r3, r1, r2
ret r3

(I'm making this up, obviously)

So for people that like to think in terms of how a (von Neumann) CPU core actually operates, or who want to simplify compilation for such a target architecture, statements are the way.

There is also the particular 'evil' (as in non-functional) assignment statement. It is required for expressing terminating loops without recursion. According to Dijkstra, loops have simpler semantics than recursion (ref. E.W. Dijkstra, "A Discipline of Programming" 1976). A loop executes faster and consumes less storage than recursion. Unless your language optimizes for tail recursion (like Scheme).

Why distinction between expression and statement

3 Answers3