0

I have been working on a function in Python that finds the sum of all the elements in an array from their respective indices to the start of the array. Example: Input: [2,14,17,36]

Output: [2, 14+2, 17+14+2, 36+17+14+2]

This is the code.

import matplotlib.pyplot as plt
import numpy as np
arr = []
a = np.array([2, 0, 0, 4, 0, 1, 0, 4, 5, 5])
def rolling_sum(x):
    total = 0
    values = []
    for i,j in enumerate(x):
        total = total+j 
        values.append(total)
    if total <= 2000000000:
        arr.append(values)
        return rolling_sum(values)
    else:
        return values
rolling_sum(a)
for i in arr:
    plt.plot(i)

Inspecting the arr variable reveals that there are negative numbers in it and it is even shown clearly from the graph. Please why is this so? This is how my graph looks like image

Daniel Walker
  • 6,380
  • 5
  • 22
  • 45

2 Answers2

2

NumPy arrays use a fixed-size integer type (for example int64) in contrast to the regular Python int type, which has unlimited size.

They have a maximum value that they can represent. Trying to add a value that would be larger than this maximum value results in overflow, which likely results in a negative value for a signed integer type.

See Often big numbers become negative

For example, the maximum value of the int32 type is 231−1 = 2147483647, adding 1 to it results in the minimum value, −231 = −2147483648. For the int64 type, these values are much larger; 9223372036854775807 and −9223372036854775808, respectively. Depending on your implementation (i.e., which type NumPy uses by default) and the inputs you use, you may or may not see this behaviour.

In your case, you don't seem to be using any of the features a NumPy array provides compared to a Python list, so you could just use a = [2, 0, 0, 4, 0, 1, 0, 4, 5, 5] and rely on Python handling unlimited big integers for you.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
2

When you execute the code, notice the warning:

RuntimeWarning: overflow encountered in long_scalars
    total = total+j

This is because numpy sometimes defaults to np.int32 (depending on the operating system being 32/64 bit and the installed Python interpreter being 32/64 bit, obviously the lower of them). In such cases large numbers will overflow, thus they wrap to the negatives.

This is easily solved by providing dtype=np.int64:

a = np.array([2, 0, 0, 4, 0, 1, 0, 4, 5, 5], dtype=np.int64)

This can be confirmed by adding:

from itertools import chain

print(len(list(filter(lambda n: n < 0, chain.from_iterable(arr)))))

This flattens arr and counts how many negative numbers there are. With the original code the output is

RuntimeWarning: overflow encountered in long_scalars
  total = total+j
5

After adding dtype=np.int64 the output is

0
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • 1
    "This is because numpy defaults to np.int32" Not exactly. It depends on on the platform. Basically, it is whatever a C long is on your platform, so on 32bit linux/windows it will be 32 bit, but on 64bit linux/windows it will be 64 bit for linux, 32 bit for windows. – juanpa.arrivillaga Sep 01 '20 at 23:23
  • @juanpa.arrivillaga Indeed, but I believe it also depends on the Python version. I'm on 64bit win 10 machine running Python 32bit and numpy defaulted to `int32`. Anyway I'd add that to the answer – DeepSpace Sep 01 '20 at 23:26
  • @DeepSpace ah yes, that too. – juanpa.arrivillaga Sep 01 '20 at 23:37