Find largest substring of numbers within a tolerance level

Question

I have the following input:

a tolerance level T
Number of numbers N
N numbers

The task is to find the longest period within those N numbers such that they are within the tolerance level. More precisely, given a left and a right bound of a substring l and r and two distinct elements a1 and a2 between the two bounds, it must hold that |a1 - a1| <= T. How can I do this in an efficient way? My approach is:

def getLength(T, N, numbers):

    max_length = 1

    for i in range(0, N-1):
        start = numbers[i]
        numlist = [start]

        for j in range(i+1, N):
            end = numbers[j]
            numlist.append(end)

            if (max(numlist) - min(numlist)) > T:
                break

            if (j-i+1) > max_length:
                max_length = j-i+1

    return max_length

EDIT: To make it clear. The code works as expected. However, it is not efficient enough. I would like to do it more efficiently.

Unclear what your question is—is there something wrong with your code? Please [edit] your question and provide the needed information. — martineau, Nov 06 '17 at 20:16
@martineau yes this might have been a bit misleading. The code works as expected. It is just not efficient enough. I need to find a way do do it more efficiently. — Dr3w Br1ck13, Nov 06 '17 at 20:51
That's a little better—but how are you measuring efficiency and how much is good enough verses what you have? — martineau, Nov 06 '17 at 20:59
@martineau I need it to be efficient enough so that it can calculate `max_length` within 2 seconds for `N > 12'000` — Dr3w Br1ck13, Nov 06 '17 at 21:07
I just ran your code on 1200 random numbers is the range of 0-999 with a tolerance of 100 and it took `0.002000093460083008` secs—so I'm unsure of how to help you. Perhaps you should [edit] your question again and add enough code to reproduce the problem and provide folks with something to test their possible solutions with. — martineau, Nov 06 '17 at 21:55
1200 was what I used when posting the previous comment, but it only takes about `0.0015` seconds to do 12000. For details see the "answer" I posted. — martineau, Nov 07 '17 at 02:42

martineau · Answer 1 · 2017-11-07T02:44:17.943

First of all, I'm not sure if your code does what you describe in your question. Secondly, it takes (much) less than second to process 12,000 random numbers.

Regardless, it can be sped up by not calling min() and max() on the numlist every time a new element is appended to it. Instead you can just update the current minimum and maximum variables with a couple of if statements.

Here code showing that being done, along with a simple framework I wrote for timing performance:

def getLength(T, N, numbers):
    max_length = 1

    for i in range(N-1):
        start = numbers[i]
        numlist = [start]
        min_numlist = max_numlist = start  # Added variables.

        for j in range(i+1, N):
            end = numbers[j]
            numlist.append(end)

# Inefficient - replaced.
#            if (max(numlist) - min(numlist)) > T:
#                break

            # Update extremities.
            if end > max_numlist:
                max_numlist = end
            if end < min_numlist:
                min_numlist = end

            if max_numlist-min_numlist > T:
                break

            if j-i+1 > max_length:
                max_length = j-i+1

    return max_length


if __name__ == '__main__':
    import random
    import time

    random.seed(42)  # Use hardcoded seed to get same numbers each time run.
    T = 100
    N = 12000
    numbers = [random.randrange(1000) for _ in range(N)]
    starttime = time.time()
    max_length = getLength(T, N, numbers)
    stoptime = time.time()
    print('max length: {}'.format(max_length))
    print('processing {:,d} elements took {:.5f} secs'.format(N, stoptime-starttime))

Find largest substring of numbers within a tolerance level

1 Answers1