-1

I have a VERY long numpy array of 3d-tuples:

array([('Session A', 'mov1', 1932), ('Session A', 'mov1', 1934),
       ('Session A', 'mov1', 1936), ..., ('Session B', 'mov99', 5306),
       ('Session B', 'mov99', 5308), ('Session B', 'mov99', 5310)], dtype=object)

Each tuple's first & second values are from a small set:

first_values = set('Session A', 'Session B')
second_values = set('mov1', 'mov2', 'mov3', ... , 'mov100')

But the third value can be any positive integer.
I'm looking for a nice Pythonic way to split the original array to separate arrays of tuples where:

  1. All tuples have the same value for the 1st & 2nd argument.
  2. The difference between the 3rd argument of every consecutive tuple is no greater than a given value delta

So for example:

delta = 5
data = [('Session A', 'mov1', 1000), ('Session A', 'mov1', 1001), ('Session A', 'mov1', 1003), ('Session A', 'mov1', 1007), ('Session A', 'mov1', 1010), ('Session A', 'mov1', 1050), ('Session A', 'mov1', 1052), ('Session A', 'mov2', 1002), ('Session A', 'mov2', 1004)]

*magical python function*

result = [
    [('Session A', 'mov1', 1000), ('Session A', 'mov1', 1001), ('Session A', 'mov1', 1003), ('Session A', 
    'mov1', 1007), ('Session A', 'mov1', 1010)], 
    [('Session A', 'mov1', 1050), ('Session A', 'mov1', 1052)],
    [('Session A', 'mov2', 1002), ('Session A', 'mov2', 1004)]
]

I found this answer but it's not exactly what I need. Any suggestions?

Jon Nir
  • 507
  • 3
  • 15

1 Answers1

3

You can achieve what you want by using itertools to group the data by the first two elements of each tuple, and then looping over those results to break up the lists when the change in value of third element exceeds delta. This can be implemented as follows:

import itertools

delta = 5
data = [
    ('Session A', 'mov1', 1000), ('Session A', 'mov1', 1001),
    ('Session A', 'mov1', 1003), ('Session A', 'mov1', 1007),
    ('Session A', 'mov1', 1010), ('Session A', 'mov1', 1050),
    ('Session A', 'mov1', 1052), ('Session A', 'mov2', 1002),
    ('Session A', 'mov2', 1004)
]

result = []
for key, group in itertools.groupby(data, key = lambda x: (x[0],x[1])):
    work = []
    prev = None
    for elem in list(group):
        if (prev is not None) and (elem[2] - prev > delta):
            result.append(work)
            work = []
        work.append(elem)
        prev = elem[2]
    result.append(work)
wwii
  • 23,232
  • 7
  • 37
  • 77
Ken
  • 443
  • 4
  • 8