Finding if there are n data points in a row that are less than a certain number

Question

I am working with a spectrum in Python and I have fit a line to that spectrum. I want a code that can detect if there have been let's say, 10, data points on the spectrum in a row that are less than the fitted line. Does anyone know how a simple and quick way to do this?

I currently have something like this:

count = 0
for i in range(lowerbound, upperbound):
    if spectrum[i] < fittedline[i]
        count += 1
    if count > 15:
        *do whatever*

If I changed the first if statement line to be:

if spectrum[i] < fittedline[i] & spectrum[i+1] < fittedline[i+1] & so on

I'm sure the algorithm would work, but is there a smarter way for me to automate this in the case where I want the user to input a number for how many data points in a row must be less than the fitted line?

Hey Pranav, I wasn't asking for anyone to specifically code this feature for me. I have made an honest attempt, but I'm struggling with figuring out the "in a row" feature and I'm asking here on SO to check if anyone knows a clever way to do so? — qwerties, Jul 15 '21 at 14:50
Share the code you're struggling with. Ask a specific question related to that code. People will use what they can from your code to write an answer that makes sense to you. If your code is completely useless, people will tell you how to proceed. Including yoru code in the question lets people see what variables you are using, what your data looks like, and gives people a starting point to write their answers. — Pranav Hosangadi, Jul 15 '21 at 14:54

Pranav Hosangadi · Accepted Answer · 2021-07-15T16:22:35.820

1

Your attempt is pretty close to working! For consecutive points, all you need to do is reset the count if one point doesn't satisfy your condition.

num_points = int(input("How many points must be less than the fitted line? "))

count = 0
for i in range(lowerbound, upperbound):
    if spectrum[i] < fittedline[i]:
        count += 1
    else: # If the current point is NOT below the threshold, reset the count
        count = 0

    if count >= num_points:
        print(f"{count} consecutive points found at location {i-count+1}-{i}!")

Let's test this:

lowerbound = 0
upperbound = 10

num_points = 5

spectrum = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
fittedline = [1, 2, 10, 10, 10, 10, 10, 8, 9, 10]

Running the code with these values gives:

5 consecutive points found at location 2-6!

edited Jul 15 '21 at 16:22

answered Jul 15 '21 at 14:57

Pranav Hosangadi

23,755
7
44
70

I don't think this way of using loops and Ifs is a best practice – gilgorio Jul 15 '21 at 18:48
@gilgorio please elaborate. How would you do it? I built on OP's code. IMO slicing and zipping `spectrum` and `fittedline` involves two slice operations which take up new memory, so I think this is an acceptable way to do what OP wants. – Pranav Hosangadi Jul 15 '21 at 18:59

gilgorio · Answer 2 · 2021-07-15T19:08:07.420

My recommendation would be to research and use existing libraries before developing ad-hoc functionality

In this case some super smart people developed numerical python library numpy. This library, widely use in science projects, has a ton of useful functionality implementations of the shelf that are tested and optimized

Your needs can be covered with the following line:

number_of_points = (np.array(spectrum) < np.array(fittedline)).sum()

But lets go step by step:

spectrum = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
fittedline = [1, 2, 10, 10, 10, 10, 10, 8, 9, 10]

# Import numerical python module
import numpy as np

# Convert your lists to numpy arrays
spectrum_array = np.array(spectrum)
gittedline_array = np.array(fittedline)

# Substract fitted line to spectrum
difference = spectrum_array - gittedline_array
#>>> array([ 0,  0, -7, -6, -5, -4, -3,  0,  0,  0])

# Identify points where condition is met
condition_check_array = difference < 0.0
# >>> array([False, False,  True,  True,  True,  True,  True, False, False, False])

# Get the number of points where condition is met
number_of_points = condition_check_array.sum()
# >>> 5

# Get index of points where condition is met
index_of_points = np.where(difference < 0)
# >>> (array([2, 3, 4, 5, 6], dtype=int64),)

print(f"{number_of_points} points found at location {index_of_points[0][0]}-{index_of_points[0][-1]}!")

# Now same functionality in a simple function
def get_point_count(spectrum, fittedline):  
    return (np.array(spectrum) < np.array(fittedline)).sum()

get_point_count(spectrum, fittedline)

Now let's consider instead of having 10 points in your spectrum, you have 10M. Code efficience is a key thing to consider and numpy can save help there:

number_of_samples = 1000000
spectrum = [1] * number_of_samples
# >>> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
fittedline = [0] * number_of_samples
fittedline[2:7] =[2] * 5
# >>> [0, 0, 2, 2, 2, 2, 2, 0, 0, 0, ...]

# With numpy
start_time = time.time()
number_of_points = (np.array(spectrum) < np.array(fittedline)).sum()
numpy_time = time.time() - start_time
print("--- %s seconds ---" % (numpy_time))


# With ad hoc loop and ifs
start_time = time.time()
count=0
for i in range(0, len(spectrum)):
    if spectrum[i] < fittedline[i]:
        count += 1
    else: # If the current point is NOT below the threshold, reset the count
        count = 0
adhoc_time = time.time() - start_time
print("--- %s seconds ---" % (adhoc_time))

print("Ad hoc is {:3.1f}% slower".format(100 * (adhoc_time / numpy_time - 1)))

number_of_samples = 1000000
spectrum = [1] * number_of_samples
# >>> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
fittedline = [0] * number_of_samples
fittedline[2:7] =[2] * 5
# >>> [0, 0, 2, 2, 2, 2, 2, 0, 0, 0, ...]

# With numpy
start_time = time.time()
number_of_points = (np.array(spectrum) < np.array(fittedline)).sum()
numpy_time = time.time() - start_time
print("--- %s seconds ---" % (numpy_time))


# With ad hoc loop and ifs
start_time = time.time()
count=0
for i in range(0, len(spectrum)):
    if spectrum[i] < fittedline[i]:
        count += 1
    else: # If the current point is NOT below the threshold, reset the count
        count = 0
adhoc_time = time.time() - start_time
print("--- %s seconds ---" % (adhoc_time))

print("Ad hoc is {:3.1f}% slower".format(100 * (adhoc_time / numpy_time - 1)))

>>>--- 0.20999646186828613 seconds ---
>>>--- 0.28800177574157715 seconds ---
>>>Ad hoc is 37.1% slower

1. OP gives no indication that they use numpy. Using numpy is overkill if they aren't already using it and they have a small number of points. 2. OP is looking for **consecutive** points. Your algorithm doesn't do this. Consider the input `spectrum = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; fittedline = [1, 2, 10, 10, 0, 0, 10, 8, 9, 10]`. Your code spits out `3`, when there are only two consecutive points satisfying the condition. — Pranav Hosangadi, Jul 15 '21 at 19:03
I don't see the "consecutive" requirement in qwertie's question. Regarding numpy, learning how to use state of the art libraries widely supported is not an overkill in my oppinion, it is the way to improve as a developer — gilgorio, Jul 15 '21 at 19:07
They even bolded that part in their question: _...on the spectrum in **a row** that are less than..._ Sure, _learning_ how to use it is perfectly valid. The decision on whether to _use_ a library depends on more things than just "I want to learn it". — Pranav Hosangadi, Jul 15 '21 at 19:09
You are right on both points, I understood the "row" word as a multidimensional spectrum, with different "rows". Let me retire my downvote. I still think numpy is the way to go for this kind of operations — gilgorio, Jul 15 '21 at 19:18

Finding if there are n data points in a row that are less than a certain number

2 Answers2