Splitting np array of tuples by last value, but only if the rest of the tuple matches

Question

I have a VERY long numpy array of 3d-tuples:

array([('Session A', 'mov1', 1932), ('Session A', 'mov1', 1934),
       ('Session A', 'mov1', 1936), ..., ('Session B', 'mov99', 5306),
       ('Session B', 'mov99', 5308), ('Session B', 'mov99', 5310)], dtype=object)

Each tuple's first & second values are from a small set:

first_values = set('Session A', 'Session B')
second_values = set('mov1', 'mov2', 'mov3', ... , 'mov100')

But the third value can be any positive integer.
I'm looking for a nice Pythonic way to split the original array to separate arrays of tuples where:

All tuples have the same value for the 1st & 2nd argument.
The difference between the 3rd argument of every consecutive tuple is no greater than a given value delta

So for example:

delta = 5
data = [('Session A', 'mov1', 1000), ('Session A', 'mov1', 1001), ('Session A', 'mov1', 1003), ('Session A', 'mov1', 1007), ('Session A', 'mov1', 1010), ('Session A', 'mov1', 1050), ('Session A', 'mov1', 1052), ('Session A', 'mov2', 1002), ('Session A', 'mov2', 1004)]

*magical python function*

result = [
    [('Session A', 'mov1', 1000), ('Session A', 'mov1', 1001), ('Session A', 'mov1', 1003), ('Session A', 
    'mov1', 1007), ('Session A', 'mov1', 1010)], 
    [('Session A', 'mov1', 1050), ('Session A', 'mov1', 1052)],
    [('Session A', 'mov2', 1002), ('Session A', 'mov2', 1004)]
]

I found this answer but it's not exactly what I need. Any suggestions?

yeah, because it's fake data. I can cast it into an array if it makes you feel better, wouldn't change the required functionality — Jon Nir, Mar 14 '20 at 15:47
But `np.array(data)` is not the same as the start either. It's a 2d array string dtype. — hpaulj, Mar 14 '20 at 16:00
What makes a solution 'nice pythonic'? If the python code runs, is that enogh? — hpaulj, Mar 14 '20 at 16:03

score 3 · Accepted Answer · edited Mar 14 '20 at 16:40

You can achieve what you want by using itertools to group the data by the first two elements of each tuple, and then looping over those results to break up the lists when the change in value of third element exceeds delta. This can be implemented as follows:

import itertools

delta = 5
data = [
    ('Session A', 'mov1', 1000), ('Session A', 'mov1', 1001),
    ('Session A', 'mov1', 1003), ('Session A', 'mov1', 1007),
    ('Session A', 'mov1', 1010), ('Session A', 'mov1', 1050),
    ('Session A', 'mov1', 1052), ('Session A', 'mov2', 1002),
    ('Session A', 'mov2', 1004)
]

result = []
for key, group in itertools.groupby(data, key = lambda x: (x[0],x[1])):
    work = []
    prev = None
    for elem in list(group):
        if (prev is not None) and (elem[2] - prev > delta):
            result.append(work)
            work = []
        work.append(elem)
        prev = elem[2]
    result.append(work)

Splitting np array of tuples by last value, but only if the rest of the tuple matches

1 Answers1