30

(Assume use strict; use warnings; throughout this question.)

I am exploring the usage of sub.

sub bb { print @_; }
bb 'a';

This works as expected. The parenthesis is optional, like with many other functions, like print, open etc.

However, this causes a compilation error:

bb 'a';
sub bb { print @_; }

String found where operator expected at t13.pl line 4, near "bb 'a'"
        (Do you need to predeclare bb?)
syntax error at t13.pl line 4, near "bb 'a'"
Execution of t13.pl aborted due to compilation errors.

But this does not:

bb('a');
sub bb { print @_; }

Similarly, a sub without args, such as:

special_print;
my special_print { print $some_stuff }

Will cause this error:

Bareword "special_print" not allowed while "strict subs" in use at t13.pl line 6.
Execution of t13.pl aborted due to compilation errors.

Ways to alleviate this particular error is:

  • Put & before the sub name, e.g. &special_print
  • Put empty parenthesis after sub name, e.g. special_print()
  • Predeclare special_print with sub special_print at the top of the script.
  • Call special_print after the sub declaration.

My question is, why this special treatment? If I can use a sub globally within the script, why can't I use it any way I want it? Is there a logic to sub being implemented this way?

ETA: I know how I can fix it. I want to know the logic behind this.

TLP
  • 66,756
  • 10
  • 92
  • 149
  • `my special_print { print $some_stuff }` Did you mean `sub special_print { print $some_stuff }`? – Borodin Sep 20 '16 at 17:20

4 Answers4

34

I think what you are missing is that Perl uses a strictly one-pass parser. It does not scan the file for subroutines, and then go back and compile the rest. Knowing this, the following describes how the one pass parse system works:

In Perl, the sub NAME syntax for declaring a subroutine is equivalent to the following:

sub name {...}   ===   BEGIN {*name = sub {...}}

This means that the sub NAME syntax has a compile time effect. When Perl is parsing source code, it is working with a current set of declarations. By default, the set is the builtin functions. Since Perl already knows about these, it lets you omit the parenthesis.

As soon as the compiler hits a BEGIN block, it compiles the inside of the block using the current rule set, and then immediately executes the block. If anything in that block changes the rule set (such as adding a subroutine to the current namespace), those new rules will be in effect for the remainder of the parse.

Without a predeclared rule, an identifier will be interpreted as follows:

bareword       ===   'bareword'   # a string
bareword LIST  ===   syntax error, missing ','
bareword()     ===   &bareword()  # runtime execution of &bareword
&bareword      ===   &bareword    # same
&bareword()    ===   &bareword()  # same

When using strict and warnings as you have stated, barewords will not be converted into strings, so the first example is a syntax error.

When predeclared with any of the following:

sub bareword;
use subs 'bareword';
sub bareword {...}
BEGIN {*bareword = sub {...}}

Then the identifier will be interpreted as follows:

bareword      ===   &bareword()     # compile time binding to &bareword
bareword LIST ===   &bareword(LIST) # same
bareword()    ===   &bareword()     # same
&bareword     ===   &bareword       # same
&bareword()   ===   &bareword()     # same

So in order for the first example to not be a syntax error, one of the preceding subroutine declarations must be seen first.

As to the why behind all of this, Perl has a lot of legacy. One of the goals in developing Perl was complete backwards compatibility. A script that works in Perl 1 still works in Perl 5. Because of this, it is not possible to change the rules surrounding bareword parsing.

That said, you will be hard pressed to find a language that is more flexible in the ways it lets you call subroutines. This allows you to find the method that works best for you. In my own code, if I need to call a subroutine before it has been declared, I usually use name(...), but if that subroutine has a prototype, I will call it as &name(...) (and you will get a warning "subroutine called too early to check prototype" if you don't call it this way).

Eric Strom
  • 39,821
  • 2
  • 80
  • 152
  • 3
    I find `&name` to be evil purely *because* it can be used to bypass prototypes, and most beginners do not realize this. IMO `name(args)` is better. – Ether May 13 '11 at 20:04
  • Nit #1: Module imports and `use constants` are even more common ways of declaring subs. – ikegami May 13 '11 at 20:20
  • 2
    Nit #2: Your lists don't account for `BAREWORD BAREWORD` (indirect method call), `BAREWORD ->` (class method call), `BAREWORD =>` (fat comma autoquoting), `print BAREWORD ...` (The `*` prototype and similar) and surely others. – ikegami May 13 '11 at 20:21
17

The best answer I can come up with is that's the way Perl is written. It's not a satisfying answer, but in the end, it's the truth. Perl 6 (if it ever comes out) won't have this limitation.

Perl has a lot of crud and cruft from five different versions of the language. Perl 4 and Perl 5 did some major changes which can cause problems with earlier programs written in a free flowing manner.

Because of the long history, and the various ways Perl has and can work, it can be difficult for Perl to understand what's going on. When you have this:

b $a, $c;

Perl has no way of knowing if b is a string and is simply a bareword (which was allowed in Perl 4) or if b is a function. If b is a function, it should be stored in the symbol table as the rest of the program is parsed. If b isn't a subroutine, you shouldn't put it in the symbol table.

When the Perl compiler sees this:

b($a, $c);

It doesn't know what the function b does, but it at least knows it's a function and can store it in the symbol table waiting for the definition to come later.

When you pre-declare your function, Perl can see this:

sub b;   #Or use subs qw(b); will also work.

b $a, $c;

and know that b is a function. It might not know what the function does, but there's now a symbol table entry for b as a function.

One of the reasons for Perl 6 is to remove much of the baggage left from the older versions of Perl and to remove strange things like this.

By the way, never ever use Perl Prototypes to get around this limitation. Use use subs or predeclare a blank subroutine. Don't use prototypes.

David W.
  • 105,218
  • 39
  • 216
  • 337
  • Yeah... except perl WOULD know that bareword `bb` is a sub, because I declared it within the script. +1 I commend you for this insightful answer, but it still is not quite what I seek. – TLP May 13 '11 at 14:52
  • And if the compiler sees the declaration before you try using a bareword as the sub, yes, it treats it as a sub. If you try using a bareword as a sub before you declare the sub, it has no way to know to treat it as a sub. – Oesor May 13 '11 at 14:59
  • @Oesor except the compiler knows it is a sub if I use parentheses, `bb()`. – TLP May 13 '11 at 15:02
  • Because parentheses are the way to tell the compiler something is a sub! – Oesor May 13 '11 at 15:03
  • @Oesor I don't think you're understanding my question. The subs are precompiled. Otherwise you wouldn't be able to use `bb()` in the script. So why does it need a declaration to realize that `bb` is a sub? – TLP May 13 '11 at 15:06
  • 1
    They are not. `perl -e "bb()"` gives `Undefined subroutine &main::bb called at -e line 1.` When you use the parentheses, the compiler tries running the sub and dies at runtime if you don't declare it. – Oesor May 13 '11 at 15:09
  • @Oesor eh... yes they are. `perl -we "bb(); sub bb {}"` gives no warning, disregarding the flow of the script. The subs are handled before runtime. – TLP May 13 '11 at 15:26
  • 1
    @David => I am not sure what you are getting at with regard to prototypes. Prototypes do not provide a way around this issue, and are very useful in limited cases. Blanket statements like "never ever use Perl prototypes" belong in the same bucket as "never ever turn off strict" (which leads people to doing things like string eval when turning off strict would be faster and safer) – Eric Strom May 13 '11 at 15:30
  • @TLP The compiler does not look through the input, compile the subs, then compile the rest of the program. It doesn't warn because it sees `bb()` and marks that as `&main;:bb`, then `&main:;bb` is defined with the `sub bb {}`, then the program executes and runs `&main;:bb`. – Oesor May 13 '11 at 15:31
  • 1
    @TLP => take a look at my answer for more details, but you are missing a crucial detail of the way Perl executes code. It first compiles the entire file, and then executes the compiled op tree. Since you have declared `bb` at compile time with the `sub name` construct, by the time runtime happens, the `bb()` subroutine is available, and that is why you do not get an error. – Eric Strom May 13 '11 at 15:33
  • 3
    @TLP - Perl programs are parsed from top to bottom, as are many scripting languages, with a one-pass parser. That means, it sees the `b $a, $c` before it sees the subroutine entry. At this point, there could be multiple ways of interpreting this line. Yes, the answer is further below, but the parser doesn't know that yet. In Perl 6, this will be solved -- not because the parser will make two passes, but because Perl 6 won't allow bareword strings. Therefore, `b` must be a subroutine. – David W. May 13 '11 at 15:33
  • @Eric Strom - I used the phrase "never ever" more in the meaning of Mythbuster's "We're professionals, don't try this at home". There are times "never ever" rules need to be broken, but you should know when and where. It's why I linked that phrase to Tom Christansen's article on Perl prototyping. I can imagine someone reading about how you can pre-decare Perl subroutines, and they decide, if they're going to pre-declare the sub, might as well prototype it while they're at it. And, that would be a big mistake. – David W. May 13 '11 at 15:43
  • @David => gotcha, fair enough. – Eric Strom May 13 '11 at 15:50
6

Parentheses are optional only if the subroutine has been predeclared. This is documented in perlsub.

Perl needs to know at compile time whether the bareword is a subroutine name or a string literal. If you use parentheses, Perl will guess that it's a subroutine name. Otherwise you need to provide this information beforehand (e.g. using subs).

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
  • You missed the part about the logic. The sub is already usable within the script, why this reduced usability? – TLP May 13 '11 at 13:50
  • @TLP Because if `bb` isn't predeclared it's treated like `"bb"` (a string literal). – Eugene Yarmash May 13 '11 at 14:01
  • @eugene y Yes, I know. The question is "why?". – TLP May 13 '11 at 14:02
  • 2
    @TLP because if it behaved differently it would be a different language. – hobbs May 13 '11 at 14:50
  • @hobbs Let me be specific. `bb` becomes a reserved namespace with my declaration. So that if I use `bb('a')` perl knows what I mean. How come the compiler understands that `bb()` is a sub, but it doesn't understand that `bb 'a'` is a sub unless I predeclare it? – TLP May 13 '11 at 14:58
  • It's right in the answer you're commenting to: "Parentheses are optional if the subroutine has been predeclared. This is documented in perlsub." – Oesor May 13 '11 at 15:01
  • @eugene y I know what you mean. Still, it does not answer my question. If I declare a sub within my script, that bareword SHOULD be apparent to the compiler. – TLP May 13 '11 at 15:12
  • 2
    @TLP you're talking nonsense. If it's declared then it works. If it's not declared then it doesn't. – hobbs May 13 '11 at 15:48
  • @hobbs You're either not getting it, or you are being obtuse. Since `bb()` works, that means that the `bb`namespace is compiled. So why does it not work without parentheses? – TLP May 13 '11 at 15:51
  • 2
    Because `bb` hasn't been declared at that point. With the parentheses it's still unambiguously a sub call, so perl compiles it down to an entersub on `*bb{CODE}`, which will resolve successfully at runtime after the definition of `bb` is encountered (it **doesn't** mean that `bb` has been compiled at this point). Without the parentheses but with a `sub bb` definition seen, perl still makes a sub call (a behavior added in perl 5). But without the parens and without a definition in scope, it resolves the ambiguity by making the perl-1-compatible assumption that `bb` is a bareword string literal. – hobbs May 13 '11 at 18:01
-1

The reason is that Larry Wall is a linguist, not a computer scientist.

Computer scientist: The grammar of the language should be as simple & clear as possible.

  • Avoids complexity in the compiler
  • Eliminates sources of ambiguity

Larry Wall: People work differently from compilers. The language should serve the programmer, not the compiler. See also Larry Wall's outline of the three virtues of a programmer.

edgar.holleis
  • 4,803
  • 2
  • 23
  • 27
  • Nice try, but I still don't see the answer. – TLP May 13 '11 at 14:33
  • @hobbs I am not being condescending. I don't know what you are referring to. If you can explain why this question has no concrete answer, I will gladly assign your answer. – TLP May 13 '11 at 14:55
  • Pretty sure Larry Wall has a bachelor's degree in computer science. – draegtun May 13 '11 at 16:05
  • @draegtun, to my knowledge he started with a dual major in Chemistry/Music and switched to Linguistics. – Ven'Tatsu May 13 '11 at 18:05
  • 1
    This is a superb answer, and the down votes show just how much the majority of software engineers have become slaves to their computers. @TLP It answers your question perfectly. The first-level answer is *“because it's written that way”* while this answer describes *why* it was written that way. It also explains why it's just as difficult to google for Perl concepts as it is to find complex English phrases, and why it's such a divisive language. There are Java/C++ programmers, and then there are Perl programmers who like to like to have easy things made easy, and hard things made possible. – Borodin Sep 20 '16 at 17:10