2

I have 3 sensors logging data on objects at time intervals. I have it stored in a pandas dataframe that looks like this:

DataFrame

The data in the Sensor time columns represents [start time, stop time] and they are formatted as lists. The index is object.

I want to add a column or create a new dataframe that captures all windows in time where an object was tracked by at least one sensor. For instance, the first object is tracked from [0,12] and [14,20]. I know how to do it with a series of loops, but I am wondering if there is a more efficient way to do this?

Jacob Shirley
  • 63
  • 1
  • 4

2 Answers2

1

Let's start by creating the dataframe, df:

import pandas as pd 

lst = [[1,[0,5],[5,12],[14,20]], 
       [2,[5,30],[5,30],[5,35]],
       [3,[30,40],[45,55],[1,20]]]
    
df = pd.DataFrame(lst, columns =['Object', 'Sensor1Time', 'Sensor2Time', 'Sensor3Time'])
print(df)

enter image description here

Next, we define a function called overlap, which calculates the overlap between two given lists:

def overlap (list1,list2):
    if min([list1[1], list2[1]]) >= max([list1[0], list2[0]]):
        result = [min([list1[0], list2[0]]), max([list1[1], list2[1]])]
    else: 
        result = list1 ,list2  
    return result

let's call this function of two lists to see how it works:

overlap([0, 4],[2, 8])

enter image description here

In general, two intervals are overlapping if:

min([upperBoundOfA, upperBoundOfB]) >= max([lowerBoundOfA, lowerBoundOfB])

If this is the case, the union of those intervals is:

(min([lowerBoundOfA, lowerBoundOfB]), max([upperBoundOfA, upperBoundOfB])

taken from here Python - Removing overlapping lists

Now let's call another function, overlap_results, to combine the results of 3 lists overlaps:

def overlap_results (row):
    results = []
    one_two_overlap = overlap(row[0],row[1])
    if type(one_two_overlap[0]) == int:
        results = overlap(one_two_overlap,row[2])
    else:
        for lst in one_two_overlap:
            result = overlap(lst,row[2])
            for res in result:
                if res not in results: #to avoid duplicate lists
                    results.append(res)

    return results

and finally applying this function on every row of the dataframe, df, and creating a new column with the results:

df['SensorTimeFinal'] = df[['Sensor1Time','Sensor2Time','Sensor3Time']].apply(overlap_results, axis=1)
df

enter image description here

Shirin Yavari
  • 626
  • 4
  • 6
1

A possible solution:

def get_list(l1, l2, l3):
    l = sorted([l1, l2, l3], key=lambda x: x[0])
    
    extent = []
    cur_extent = l[0]
    
    for interval in [l[1], l[2]]:
        if interval[0] >= max(cur_extent):
            extent.append(cur_extent)
            cur_extent = interval
        else:
            cur_extent[1] = max(cur_extent[1], interval[1])
        
    extent.append(cur_extent)
    
    return extent

df['Total'] = df.apply(lambda x: get_list(x[1], x[2], x[3]), axis=1)

Output:

   Object Sensor1 Time Sensor2 Time Sensor3 Time  \
0       1      [0, 10]      [5, 12]     [14, 20]   
1       2      [5, 30]      [5, 30]      [5, 35]   
2       3     [30, 40]     [45, 55]      [1, 20]   

                           Total  
0            [[0, 12], [14, 20]]  
1                      [[5, 35]]  
2  [[1, 20], [30, 40], [45, 55]]  
PaulS
  • 21,159
  • 2
  • 9
  • 26