Python: Collatz sequence -- find a number where the length is greater than the starting number

Question

So here is the question, I'm trying to it solve but I'm just getting more confused.

The question wants you to find a starting number where the length of the Collatz sequence it would create is greater than the current one created with the current starting number, and the number has to be smaller than or equal to the current one.

so lets say we are given a starting number of 4 initially:

4->2->1 (length of 3)

3->10->5->16->8->4->2->1 (length of 8)

2 -> 1 (length 2)

1 (length 1)

Now the answer should be that you return 3, since 3 is less than or equal to your starting number, 4, and it is the only number that if started with you'd get a sequence with a length of greater than the one you had with your starting number, 4 in this case.

I tried to do a loop decrementing each, and add the current starting number and the length to a dictionary, starting number as key, and length as value. Then go in and return the key which has the greater value(length), but it won't work. I'm not sure if I'm just writing it wrong or if the idea is wrong...

I have a feeling its something to do with the loop, because the while loop actually does create the correct sequence on its own, when I wrap it in a for loop and use i for the calculations it becomes incorrect, even when creating the sequence.

def solution(x):
  sequence = []
  d = {}
  length_of_sequence = 0
  for i in range (x, 1, -1):
    while(i != 1):
        sequence.append(i)
        if(i % 2 == 0):
            i = i // 2
        else:
            i = 3 * i + 1
        sequence.append(1)
        length_of_sequence += len(sequence)
    d.update({i: length_of_sequence})

return max(d, key=d.get)

Thank you in advance, I've been thinking on this and I'm not getting anywhere so would appreciate any help!

jthulhu · Accepted Answer · 2021-09-12T16:06:24.507

TL;DR

The (main, IMO) problem comes from the line length_of_sequence += len(sequence), because it should be length_of_sequence = len(sequence), and from the fact that you don't reset sequence at each iteration.

However there are multiple other bugs. Here is a correct version.

def solution(x):
  d = {}
  for i in range(2, x+1):
    sequence = []
    while i != 1:
      sequence.append(i)
      if i % 2 == 0: i //= 2
      else: i = 3*i + 1
    sequence.append(1)
    length_of_sequence = len(sequence)
    d[i] = length_of_sequence
  return max(d, key=d.get)

Here is a better (faster) way to do this, with recursion.

def solution(x):
  d = {1: 1}
  def collatz(n):
    if n not in d: d[n] = (collatz(n//2) if n % 2 == 0 else collatz(3*n+1)) + 1
    return d[n]
  return max(range(1, x+1), key=collatz)

Explanation

Let's start from your code, and refactor progressively to the correct one, then continue to the faster one.

def solution(x):
  sequence = []
  d = {}
  length_of_sequence = 0
  for i in range (x, 1, -1):
    while(i != 1):
        sequence.append(i)
        if(i % 2 == 0):
            i = i // 2
        else:
            i = 3 * i + 1
        sequence.append(1)
        length_of_sequence += len(sequence)
    d.update({i: length_of_sequence})
  return max(d, key=d.get)

patching `sequence`

Notice that you initialize sequence only once, but you keep appending to it, meaning that on the second iteration, it already contains elements from the first iteration, and thus its length is strictly increasing. Since you end with 2, the maximum will probably be 2. The patch is easy, just recreate the sequence for each i you are looking at.

def solution(x):
  d = {}
  length_of_sequence = 0
  for i in range (x, 1, -1):
    sequence = []
    while(i != 1):
        sequence.append(i)
        if(i % 2 == 0):
            i = i // 2
        else:
            i = 3 * i + 1
        sequence.append(1)
        length_of_sequence += len(sequence)
    d.update({i: length_of_sequence})
  return max(d, key=d.get)

patching `length_of_sequence`

The previous patch, however, has no effect because you still carry the length of sequence from one iteration to the next, so you should initialize it at each iteration, and update it only once you have finished completing the sequence. I'll also transform

length_of_sequence = 0
length_of_sequence += len(sequence)

into

length_of_sequence = len(sequence)

def solution(x):
  d = {}
  for i in range (x, 1, -1):
    sequence = []
    while(i != 1):
        sequence.append(i)
        if(i % 2 == 0):
            i = i // 2
        else:
            i = 3 * i + 1
        sequence.append(1)
    length_of_sequence = len(sequence) # This has been de-indented
                                       # because we are not interested
                                       # in the temporary length of the sequence,
                                       # but only in the final one.
    d.update({i: length_of_sequence})
  return max(d, key=d.get)

patching `append`

Currently, you are appending 1 each time you append a new element to the sequence, so the result is 4 -> 1 -> 2 -> 1. You only want to append it once, at the end.

def solution(x):
  d = {}
  for i in range (x, 1, -1):
    sequence = []
    while(i != 1):
        sequence.append(i)
        if(i % 2 == 0):
            i = i // 2
        else:
            i = 3 * i + 1
    sequence.append(1)
    length_of_sequence = len(sequence)
    d.update({i: length_of_sequence})
  return max(d, key=d.get)

Polishing code

Now the code is correct, but it's ugly, meaning it's not idiomatic. I'm not going to argue here about why and how your code should be idiomatic (you can google it if you are curious or not convinced), so I'll assume you want to have idiomatic code (because everybody should want to).

Parenthesis around boolean conditions

We are not in C, or C-like languages, here boolean conditions in while and for should not have parenthesis around.

def solution(x):
  d = {}
  for i in range(x, 1, -1):
    sequence = []
    while i != 1:
        sequence.append(i)
        if i % 2 == 0:
            i = i // 2
        else:
            i = 3 * i + 1
    sequence.append(1)
    length_of_sequence = len(sequence)
    d.update({i: length_of_sequence})
  return max(d, key=d.get)

Dictionary update

You are using an overkill way to simply add a value to a dictionary. This should be d[i] = length_of_sequence.

def solution(x):
  d = {}
  for i in range(x, 1, -1):
    sequence = []
    while i != 1:
        sequence.append(i)
        if i % 2 == 0:
            i = i // 2
        else:
            i = 3 * i + 1
    sequence.append(1)
    length_of_sequence = len(sequence)
    d[i] = length_of_sequence
  return max(d, key=d.get)

Assignment with operation shorthand

i = i // 2 can be shortened in i //= 2.

def solution(x):
  d = {}
  for i in range(x, 1, -1):
    sequence = []
    while i != 1:
        sequence.append(i)
        if i % 2 == 0:
            i //= 2
        else:
            i = 3 * i + 1
    sequence.append(1)
    length_of_sequence = len(sequence)
    d[i] = length_of_sequence
  return max(d, key=d.get)

One-lining short blocks

This is more of a personal preference, but since some of your blocks are really short, they could be one-lined.

def solution(x):
  d = {}
  for i in range(x, 1, -1):
    sequence = []
    while i != 1:
        sequence.append(i)
        if i % 2 == 0: i = i // 2
        else: i = 3 * i + 1
    sequence.append(1)
    length_of_sequence = len(sequence)
    d[i] = length_of_sequence
  return max(d, key=d.get)

In-order `range`

There is no point in doing the loop in reverse order, so let's not!

def solution(x):
  d = {}
  for i in range(2, x+1):
    sequence = []
    while i != 1:
        sequence.append(i)
        if i % 2 == 0: i = i // 2
        else: i = 3 * i + 1
    sequence.append(1)
    length_of_sequence = len(sequence)
    d[i] = length_of_sequence
  return max(d, key=d.get)

Performance

And, voilà! We have reached the correct version. Now, that we have a correct and idiomatic code, we can start having a faster one. This is not useful for small values of x, of course, so if you don't feel the current code is slow, you probably don't need this improvement. However, you might still be curious on how to improve it. Here we go!

A bottleneck

The problem with the current code, is that it doesn't take advantage of the fact that, essentially, computing the length of the sequence from a starting point is a pure and deterministic function. I'm not going to explain what it means, but simply state that it provides a way to fasten the code.

The idea is the following. When I am computing the length of the sequence starting with 4 (which is 4 -> 2 -> 1), I also have to computer its sequence. However, its sequence (4 -> 2 -> 1) also contains strictly the sequence which starts with 2 (which is 2 -> 1), so if I did the computations in the right order, I should compute first the length of the sequence of 2, then add one to get the length of the sequence of 4.

This has two advantages. First, I only do the heavy work for each number once. Secondly, I don't actually need to store the sequence, I can work directly with the length of the sequence (this optimization could have been done with your version too).

To do so, we'll store the length of the sequence starting at n in a dictionary once we will have computed it. We will recursively compute the length of the sequence starting at x by adding one to the length of the sub-sequence starting at the next number. But, we will, before, check we haven't already computed it. Finally, do this for every i between 1 and x.

def solution(x):
  d = {1: 1}
  def collatz(n):
    if n not in d: d[n] = (collatz(n//2) if n % 2 == 0 else collatz(3*n+1)) + 1
    return d[n]
  return max(range(1, x+1), key=collatz)

EDIT: Applied suggestion from @dont-talk-just-code.

Thank you so much for taking the time to explain this, and for showing me different ways to do it. Where I can wrapped the conditions in parenthesis was out of habit since I hop between java and python haha. Awesome explanation, made a lot of sense and thank you for showing an optimal way of solving the question as well. Much appreciated! — penutbutterjellytime, Sep 13 '21 at 14:09