20

In Perl, the expression "aa" .. "bb" creates a list with the strings:

aa ab ac ad ae af ag ah ai aj ak al am an ao ap aq ar as at au av aw ax ay az ba bb

In Raku, however, (at least with Rakudo v2021.08), the same expression creates:

aa ab ba bb

Even worse, while "12" .. "23" in Perl creates a list of strings with the numbers 12, 13, 14, 15, ..., 23, in Raku the same expression creates the list ("12", "13", "22", "23").

The docs seem to be quite silent about this behaviour; at least, I could not find an explanation there. Is there any way to get Perl's behaviour for Raku ranges?

(I know that the second problem can be solved via typecast to Int. This does not apply to the first problem, though.)

ikegami
  • 367,544
  • 15
  • 269
  • 518
Nikola Benes
  • 2,372
  • 1
  • 20
  • 33

3 Answers3

17

It's possible to get the Perl behavior by using a sequence with a custom generator:

say 'aa', *.succ … 'bb';
# OUTPUT: «aa ab ac ad ae af ag ah ai aj ak al am an ao ap aq ar as at au av aw ax ay az ba bb»

say '12', *.succ … '23';
# OUTPUT: «12 13 14 15 16 17 18 19 20 21 22 23»

(Oh, and a half solution for the '12'..'23' case: you already noted that you can cast the endpoints to a Numeric type to get the output you want. But you don't actually need to cast both endpoints – just the bottom. So 12..'23' still produces the full output. As a corollary, because ^'23' is sugar for 0..^'23', any Range built with &prefix:<^> will be numeric.)

For the "why" behind this behavior, please refer to my other answer to this question.

codesections
  • 8,900
  • 16
  • 50
15

TL;DR Add one or more extra characters to the endpoint string. It doesn't matter what the character(s) is/are.


10 years after the current doc corpus was kicked started by Moritz Lenz++, Raku's doc is, as ever, a work in progress.

There's a goldmine of more than 16 years worth of chat logs that I sometimes spelunk, looking for answers. A search for range "as words" with nick: TimToady netted me this in a few minutes:

TimToady beginning and ending of the same length now do the specced semantics

considering each position as a separate character range

My instant reaction:

  • Here's why it does what it does. The guy who designed how Perl's range works not only deliberately specced it to work how it now does in Raku but implemented it in Rakudo himself in 2015.

  • It does that iff "beginning and ending of the same length". Hmm.

A few seconds later:

say flat "aa" .. "bb (like perl)";
say flat "12" .. "23 (like perl)";

displays:

(aa ab ac ad ae af ag ah ai aj ak al am an ao ap aq ar as at au av aw ax ay az ba bb)
(12 13 14 15 16 17 18 19 20 21 22 23)

raiph
  • 31,607
  • 3
  • 62
  • 111
  • 2
    dear @raiph - this "(like perl)" thing is new to me and I can't find in the docs - please can you share / point to some definition of it? – librasteve Dec 06 '21 at 21:58
  • @p6steve It's a "trick". You could just to a space: `"aa" .. "bb "`. I doubt this was purposely implemented to do this... seems more of a happy accident. – elcaro Dec 06 '21 at 22:35
  • It's not only not in the docs but is *contradicted* by them, per Nikola's comment on their question. It was purposely implemented this way in Rakudo, by Larry Wall, in 2015. My answer documents this, and how I discovered this, and is everything I know about it. – raiph Dec 06 '21 at 23:16
4

[I'm splitting this into a separate answer because it addresses the "why" instead of the "how"]

I did a bit of digging, and learned that:

  1. For Sequences, having "aa"…"bb" produce "aa", "ab", "ba", "bb" is specified in Roast
  2. The original use case provided for this behavior was generating sequences of octal numbers (as Strs) (discussed again in 2018)
  3. For Ranges, the behavior of "aa".."bb" is currently unspecified and there does not appear to be consensus about what it should be.
  4. (As you already know), Rakudo's implementation has "aa".."bb" behave the same as "aa"…"bb".
  5. In 2018, lizmat ([Elizabeth Mattijsen])https://stackoverflow.com/users/7424470/elizabeth-mattijsen) on StackOverflow) changed .. to make "aa".."bb" behave the way it does in Perl but reverted that change pending consensus on the correct behavior.

So I suppose we (as a community) are still thinking about it? Personally, I'm inclined to agree with lizmat that having "aa".."bb" provide the longer range (like Perl) makes sense: if users want the shorter one, they can use a sequence. (Or, for an octal range, something like (0..0o377).map: *.fmt('%03o'))

But, either way, I definitely agree with that 2018 commit that we should pin this down in Roast – and then get it noted in the docs.

codesections
  • 8,900
  • 16
  • 50