3

I am working on a collatz sequence. I currently have a for loop.

for my $num (1..1000000) {
    my $count = 1;
    for (my $i = $num; $i != 1; $count++) {
        $i = $i % 2 ? 3 * $i + 1 : $i / 2;
    }
}

And then I have a simple way of working out the count of the loop (who many times it takes to complete the theory).

if ($count > $max_length) {
    $max = $num;
    $max_length = $count;
}

I worked out that code could be made quicker by using a simple theory.

If n = 3, it would have this sequence {3,10,5,16,8,4,2,1} [8] If n = 6, it would have this sequence {6,3,10,5,16,8,4,2,1} [9] If n = 12, it would have this sequence {12,6,3,10,5,16,8,4,2,1} [10]

So I want to save the result of 3, to be able to work out the result of 6 by just adding 1 to the count and so forth.

I tried to tackle this, with what I thought would do the trick but it infact made my program take 1 minute longer to complete, I now have a program that takes 1.49 seconds rather than 30 seconds I had before.

This is how I added the cache(it's probably wrong)

The below is outside of the for loop

my $cache = 0;
my $lengthcache = 0;

I then have this bit of code which sits after the $i line, line 4 in the for loop

    $cache = $i;
    $lengthcache = $count;
    if  ($cache = $num*2) {
            $lengthcache++;
    }

I don't want the answer given to me in full, I just need to understand how to correctly cache without making the code slower.

Runt
  • 55
  • 5

3 Answers3

3

You just want the length, right? There's no much savings to be obtained to caching the sequence, and the memory usage will be quite large.

Write a recursive function that returns the length.

sub seq_len {
   my ($n) = @_;
   return 1 if $n == 1;
   return 1 + seq_len( $n % 2 ? 3 * $n + 1 : $n / 2 );
}

Cache the result.

my %cache;

sub seq_len {
   my ($n) = @_;
   return $cache{$n} if $cache{$n};
   return $cache{$n} = 1 if $n == 1;
   return $cache{$n} = 1 + seq_len( $n % 2 ? 3 * $n + 1 : $n / 2 );
}

Might as well move terminating conditions to the cache.

my %cache = ( 1 => 1 );

sub seq_len {
   my ($n) = @_;
   return $cache{$n} ||= 1 + seq_len( $n % 2 ? 3 * $n + 1 : $n / 2 );
}

Recursion is not necessary. You can speed it up by flatting it. It's a bit tricky, but you can do it using the usual technique[1].

my %cache = ( 1 => 1 );

sub seq_len {
   my ($n) = @_;

   my @to_cache;
   while (1) {
      if (my $length = $cache{$n}) {
         $cache{pop(@to_cache)} = ++$length while @to_cache;
         return $length;
      }

      push @to_cache, $n;
      $n = $n % 2 ? 3 * $n + 1 : $n / 2;
   }
}

Making sure it works:

use strict;
use warnings;
use feature qw( say );
use List::Util qw( sum );

my $calculations;

   my %cache = ( 1 => 1 );

   sub seq_len {
      my ($n) = @_;

      my @to_cache;
      while (1) {
         if (my $length = $cache{$n}) {
            $cache{pop(@to_cache)} = ++$length while @to_cache;
            return $length;
         }

         push @to_cache, $n;
++$calculations;
         $n = $n % 2 ? 3 * $n + 1 : $n / 2;
      }
   }

my @results = map { seq_len($_) } 3,6,12;
say for @results;
say "$calculations calculations instead of " . (sum(@results)-@results);

8
9
10
9 calculations instead of 24

Notes:

  1. To remove recursion,

    1. Make the function tail-recursive by rearranging code or by passing down information about what to do on return. (The former is not possible here.)
    2. Replace the recursion with a loop plus stack.
    3. Eliminate the stack if possible. (Not possible here.)
    4. Clean up the result.
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thank you for the feedback, however using the last method the speed is around the same as my first attempt. (37.850s) – Runt May 21 '14 at 21:33
  • You did something wrong. It takes 6s instead of 29s. [Test code](https://www.paste.to/Iwxcin2W) – ikegami May 21 '14 at 21:46
  • Yes I must have done something wrong, but it's hard as I don't fully understand the method being used, or how the for gets implemented. – Runt May 21 '14 at 21:47
  • It's a counting loop. Executes the body of the loop one million times. Just like the loop in your question. I just wrote it using fewer characters. – ikegami May 21 '14 at 21:51
1

Changing your algorithm to cache results so that it can break out early:

use strict;
use warnings;

my @steps = (0,0);
my $max_steps = 0;
my $max_num = 0;

for my $num (2..1_000_000) {
    my $count = 0;
    my $i = $num;
    while ($i >= $num) {
        $i = $i % 2 ? 3 * $i + 1 : $i / 2;
        $count++;
    }
    $count += $steps[$i];
    $steps[$num] = $count;

    if ($max_steps < $count) {
        $max_steps = $count;
        $max_num = $num;
    }
}

print "$max_num takes $max_steps steps\n";

Changes my processing time from 37 seconds to 2.5 seconds.

Why is 2.5 seconds enough of an improvement?

I chose caching in an array @steps because the processing of all integers from 1 to N easily matches the indexes of an array. This also provides a memory benefit over using a hash of 33M vs 96M in a hash holding the same data.

As ikegami pointed out, this does mean that I can't cache all the values of cycles that go past 1million though, as that would quickly use up all memory. For example, the number 704,511 has a cycle that goes up to 56,991,483,520.

In the end, this means that my method does recalculate portions of certain cycles, but overall there is still a speed improvement due to not having to check for caches at every step. When I change this to use a hash and cache every cycle, the speed decreases to 9.2secs.

my %steps = (1 => 0);

for my $num (2..1_000_000) {
    my @i = $num;
    while (! defined $steps{$i[-1]}) {
        push @i, $i[-1] % 2 ? 3 * $i[-1] + 1 : $i[-1] / 2;
    }
    my $count = $steps{pop @i};
    $steps{pop @i} = ++$count while (@i);
    #...

And when I use memoize like demonstrated by Oesor, the speed is 23secs.

Miller
  • 34,962
  • 4
  • 39
  • 60
  • Note that this only does partial caching; it'll calculate the sequence lengths for some inputs more than once. Note that the optimization relies on the calling pattern in the OP; it can't find the sequence length for an arbitrary inputs without calculating the sequence length for all other inputs before it, making much much slower if you just need to calculate the length for a few inputs. – ikegami May 22 '14 at 04:03
  • Yes, it does just a partial caching, but 2.5 seconds is more than sufficient for this case. Also given the post from two days ago [`Collatz Conjecture - Iteration rather than Recursion`](http://stackoverflow.com/q/23746933/1733163) and the provided code, it seems reasonable to assume that the OP wants to determine the greatest cycle and therefore will be iterating over all numbers, not a subset. – Miller May 22 '14 at 04:19
  • Wow this is great! Am I right in saying that the last part of the array using $i as an key is to stop the array overloading it's self? – Runt May 22 '14 at 19:09
  • @user3662493 I calculate the `@steps` values in order. Therefore, once `$i` is less than the current `$num`, we know that `$steps[$i]` will be `defined` and can break out of the cycle early and rely on the cached values. – Miller May 23 '14 at 21:14
0

If you change your implementation to be a recursive function, you can wrap it with Memoize (https://metacpan.org/pod/Memoize) to speed up already calculated responses.

use strict;
use warnings;
use feature qw/say/;
use Data::Printer;

use Memoize;

memoize('collatz');

for my $num (qw/3 6 12 1/) {
    my @series = collatz($num);
    p(@series);
    say "$num : " . scalar @series;
}

sub collatz {
    my($i) = @_;

    return $i if $i == 1;
    return ($i, collatz( $i % 2 ? 3 * $i + 1 : $i / 2 ));
}

Output

[
    [0] 3,
    [1] 10,
    [2] 5,
    [3] 16,
    [4] 8,
    [5] 4,
    [6] 2,
    [7] 1
]
3 : 8
[
    [0] 6,
    [1] 3,
    [2] 10,
    [3] 5,
    [4] 16,
    [5] 8,
    [6] 4,
    [7] 2,
    [8] 1
]
6 : 9
[
    [0] 12,
    [1] 6,
    [2] 3,
    [3] 10,
    [4] 5,
    [5] 16,
    [6] 8,
    [7] 4,
    [8] 2,
    [9] 1
]
12 : 10
[
    [0] 1
]
1 : 1
Oesor
  • 6,632
  • 2
  • 29
  • 56