-1

Is there a reasonable way to get the following done in fast compilation way?

I try to sum the list of numbers to specific treshold and replace previous values to 0. I'm looking for the fastest compilation way (the list has 18 kk records).

For given example the treshold is "1".

Input:

[0.2, 0.4, 0.2, 0.2, 0.1, 1.2, 3.2 ,0.2, 0.1, 0.4, 0.5, 0.1]

Output:

[0.0, 0.0, 0.0, 1.0, 0.0, 1.3, 3.2 ,0.0, 0.0, 0.0, 1.2, 0.1]
na ni
  • 1

3 Answers3

1

A more faster approach compared to appending each interim value to the final list:

lst = [0.2, 0.4, 0.2, 0.2, 0.1, 1.2, 3.2 ,0.2, 0.1, 0.4, 0.5, 0.1]

res = [0] * len(lst)  # initial zeros
L_size, t = len(lst), 0
for i, n in enumerate(lst):
    t += n
    if t >= 1 or i == L_size - 1:
        res[i] = t
        t = 0
print(res)

[0, 0, 0, 1.0, 0, 1.3, 3.2, 0, 0, 0, 1.2000000000000002, 0.1]
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

A list comprehension:

s = 0.0
res = [
    0.0 if (s := s + x if s < 1.0 else x) < 1.0 else s
    for x in lst
]
res[-1] = s

Benchmark with ~1.8 million values, times multiplied by 10 to estimate for your 18 million:

1.74 ± 0.06 seconds  Kelly
3.32 ± 0.10 seconds  Roman
3.56 ± 0.10 seconds  Roman_Andrej
5.17 ± 0.07 seconds  mozway

Benchmark code (Attempt This Online!):

from timeit import timeit
from statistics import mean, stdev

def mozway(l):
    total = 0
    out = []
    for i, n in enumerate(l):
        new_total = total + n
        if new_total >= 1 or i+1 == len(l):
            out.append(new_total)
            total = 0
        else:
            out.append(0)
            total = new_total
    return out

def Roman(lst):
    res = [0] * len(lst)
    L_size, t = len(lst), 0
    for i, n in enumerate(lst):
        t += n
        if t >= 1 or i == L_size - 1:
            res[i] = t
            t = 0
    return res

def Roman_Andrej(lst):
    L_size, t = len(lst), 0
    for i, n in enumerate(lst):
        t += n
        if t >= 1 or i == L_size - 1:
            lst[i] = t
            t = 0
        else:
            lst[i] = 0
    return res

def Kelly(lst):
    s = 0.0
    res = [
        0.0 if (s := s + x if s < 1.0 else x) < 1.0 else s
        for x in lst
    ]
    res[-1] = s
    return res

funcs = mozway, Roman, Roman_Andrej, Kelly

lst = [0.2, 0.4, 0.2, 0.2, 0.1, 1.2, 3.2 ,0.2, 0.1, 0.4, 0.5, 0.1]
exp = [0.0, 0.0, 0.0, 1.0, 0.0, 1.3, 3.2 ,0.0, 0.0, 0.0, 1.2, 0.1]

for f in funcs:
    res = [round(x, 6) for x in f(lst[:])]
    print(res == exp)
  #  print(exp)
  #  print(res)

times = {f: [] for f in funcs}
def stats(f):
  ts = [t for t in sorted(times[f])[:5]]
  return f'{mean(ts):4.2f} ± {stdev(ts):4.2f} seconds '

lst *= 1800000 // len(lst)
for _ in range(10):
  for f in funcs:
    copy = lst[:]
    t = timeit(lambda: f(copy), number=1) * 10
    times[f].append(t)

for f in sorted(funcs, key=stats):
  print(stats(f), f.__name__)
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
0

Not sure what you mean by "fast compilation way":

l = [0.2, 0.4, 0.2, 0.2, 0.1, 1.2, 3.2 ,0.2, 0.1, 0.4, 0.5, 0.1]

total = 0
out = []
for i, n in enumerate(l):
    new_total = total + n
    if new_total >= 1 or i+1 == len(l):
        out.append(new_total)
        total = 0
    else:
        out.append(0)
        total = new_total

Output:

[0, 0, 0, 1.0, 0, 1.3, 3.2, 0, 0, 0, 1.2000000000000002, 0.1]

Running time for 18K values:

8.16 ms ± 296 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

precision

If you need precise floating point operations you might want to use Decimal:

from decimal import Decimal
from math import isclose

total = 0
out = []
for i, n in enumerate(map(Decimal, l)):
    new_total = total + n
    if new_total >= 1 or isclose(new_total, 1) or i+1 == len(l):
        out.append(float(new_total))
        total = 0
    else:
        out.append(0)
        total = new_total

Output:

[0, 0, 0, 1.0, 0, 1.3, 3.2, 0, 0, 0, 1.2, 0.1]

Running time for 18K values:

49.5 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
mozway
  • 194,879
  • 13
  • 39
  • 75
  • For input `l = [0.3, 0.7]`, the first solution gives `[0, 1.0]` and the "precise" second solution gives `[0, 0.9999999999999999]`. – Kelly Bundy Feb 19 '23 at 18:32
  • @KellyBundy I knew I'd get such comment when I added this, I didn't mean precision in terms of single float representation, but rather for the repeated addition of floats. If I'm not mistaken `Decimal` is more precise for that than floats (the ideal would be to use `Decimal` from the beginning and not to convert from existing floats). – mozway Feb 19 '23 at 18:39