1

For reproducibility reasons, I am sharing the data here.

From column 2, I wanted to read the current row and compare it with the value of the previous row. If it is greater, I keep comparing. If the current value is smaller than the previous row's value, I want to divide the current value (smaller) by the previous value (larger). Accordingly, the following code:

import numpy as np
import matplotlib.pyplot as plt

protocols = {}

types = {"data_c": "data_c.csv", "data_r": "data_r.csv", "data_v": "data_v.csv"}

for protname, fname in types.items():
    col_time,col_window = np.loadtxt(fname,delimiter=',').T
    trailing_window = col_window[:-1] # "past" values at a given index
    leading_window  = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "quotient_times": quotient_times,
        "quotient": quotient,
    }

data_c is a numpy.array that has only one unique quotient value 0.7, as does data_r with a unique quotient value of 0.5. However, data_v has two unique quotient values (either 0.5 or 0.8).

I wanted to loop through the quotient values of these CSV files and categorize them using a simple if-else statement. I get help from one StackOverflow contributor using numpy.array_equal as the following.

import numpy as np
unique_quotient = np.unique(quotient)
unique_data_c_quotient = np.r_[ 0.7]
unique_data_r_quotient = np.r_[ 0.5]

if np.array_equal( unique_quotient, unique_data_c_quotient ): 
    print('data_c')
elif np.array_equal( unique_quotient, unique_data_c_quotient ):
    print('data_r') 

This perfectly works for data_c and data_r whose values are 0.7 and 0.5 respectively. This means it works only when the quotient value is unique (or fixed). However, it doesn't work when the quotient value is more than one. For example, data_m has quotient values between 0.65 and 0.7 (i.e. 0.65<=quotient<=0.7) and data_v has two quotient values (0.5 and 0.8)

How can we solve this issue using numpy arrays?

1 Answers1

0

If you consistently have unique quotients, and consistently have unique quotient bounds then I would recommend the following:

ud_m_bounds = np.r_[0.65,0.7]

uq = unique_quotient
uq_min,uq_max = uq.min(),uq.max()

def is_uq_bounded_by(unique_data_bounds):
  ud_min,ud_max = unique_data_bounds.min(), unique_data_bounds.max()
  left_bounded  = ud_min <= uq_min <= ud_max
  right_bounded = ud_min <= uq_max <= ud_max
  bounded = left_bounded & right_bounded
  return bounded

label = 'ERROR -- DATA UNCLASSIFIED'
if len(uq) > 2:
  if is_uq_bounded_by( unique_data_m_bounds ):
    label = 'data_m'
elif 0 < len(uq) <= 2:
  if np.array_equal( uq, unique_data_v_quotient):
    label = 'data_v' 
  if np.array_equal( uq, unique_data_c_quotient):
    label = 'data_c'
  elif np.array_equal( uq, unique_data_r_quotient):
    label = 'data_r'
print(label)

Note that the method becomes dubious when the data begin to overlap.

jyalim
  • 3,289
  • 1
  • 15
  • 22