11

I have a SnowFlake script for Python, and I convert it to a Raku module, and call it 10,000,000 times, and it is very slow (file test.raku):

use IdWorker;

my $worker = IdWorker.new(worker_id => 10, sequence => 0);
my @ids = gather for (1...10000000) { take $worker.get_id() };

my $duration = now - INIT now;
say sprintf("%-8s %-8s %-20s", @ids.elems, Set(@ids).elems, $duration);

As @codesections's answer says, it's now that takes so much time.

Python takes about 12 seconds, while Raku takes minutes. How can I fix this?

This empty for loop takes about 0.12 seconds:

for (1...10000000) {
    ;
}

And the call get_id() on $worker takes minutes:

for (1...10000000) {
    $worker.get_id();
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
chenyf
  • 5,048
  • 1
  • 12
  • 35

1 Answers1

10

I believe that the issue here does not come from constructing the array but rather from now itself – which seems to be oddly slow.

For example, this code:

no worries; # skip printing warning for useless `now`
for ^10_000_000 { now }
say now - INIT now;

also takes minutes to run. This strikes me as a bug, and I'll open an issue [Edit: I located rakudo/rakudo#3620 on this issue. The good news is that there's already a plan for a fix.] Since your code calls now multiple times in each iteration, this issue impacts your loop even more.

Apart from that, there are a few other areas where you could speed this code up:

First, using an implicit return (that is, changing return new_id; to just new_id, and making similar changes for the other places where you use return) is generally slightly faster/lets the JIT optimize a bit better.

Second, the line

my @ids = gather for (1...10000000) { take $worker.get_id() };

is somewhat wastefully using gather/take (which adds support for lazy lists and is just a more complex construct). You can simplify this into

my @ids = (1...10000000).map: { $worker.get_id() };

(This still constructs an intermediate Seq, though.)

Third – and this one is more major from a performance impact, though literally as small as it's possible to be from a code change perspective – is to change the (1...10000000) into (1..10000000). The difference is that ... is the sequence operator while .. is the range operator. Sequences have some supper powers compared to Ranges (see the docs if you're curious), but are significantly slower to iterate over in a loop like this.

Again, though, these are minor issues; I believe the performance of now is the largest problem.

The long-term solution for now being slow is for it to be fixed (we're working on it!) As a temporary workaround, though, if you don't mind dipping into a slightly lower level than is generally advisable for user code, you can use nqp::time_n to get a floating point number of seconds for the current time. Using this would make your get_timestamp method look like:

method get_timestamp() {
    use nqp;
    (nqp::time_n() * 1000).Int;
}

With this workaround and the other refactorings I suggested above, your code now executes in around 55 seconds on my machine – still not nearly as fast as I'd like Raku to be, but well over an order of magnitude better than where we started.

codesections
  • 8,900
  • 16
  • 50
  • Indeed!, it takes about five minutes to run `now` 10 million times.Thanks for your explanation. – chenyf Mar 10 '21 at 14:30
  • 1
    Suggest that you simply capture the value of `now` ahead of time ... once. On any decent computer, a loop should be able to run 10 million times without "interesting" delay. – Mike Robinson Mar 10 '21 at 19:18
  • 1
    Basically, the TLDR is that the biggest culprit for `now` being slow is the `num` (floating point) → `Rat` conversion, which involves an fair bit of extra math operations (see https://en.wikipedia.org/wiki/Continued_fraction#Calculating_continued_fraction_representations ). – user0721090601 Mar 11 '21 at 15:55
  • with new-disp merged to master, `for ^10_000_000 { now }` now take 45 seconds, a big win. – chenyf Sep 30 '21 at 12:54