Collect elements from a list of lists based on the first elements of each group

Question

I have a list

mainlist = [['a','online',20],
            ['a','online',22],
            ['a','offline',26],
            ['a','online',28],
            ['a','offline',31],
            ['a','online',32],
            ['a','online',33],
            ['a','offline',34]]

I want to get a min of the 3rd element if the 2nd element is 'online' and the next 'offline' value as the 4th element. Iteration should happen till the end of the list.

Final output should be

outputlist = [['a', 'online', 20, 26], ['a', 'online', 28, 31], ['a', 'online', 32, 34]]

I tried the code below but it didn't help me:

from itertools import product

for a, b in product(mainlist,mainlist):
    if a[1] == 'online':
        minvalue=min(a, key=lambda x:x[2])
    if b[1] == 'offline' and b[2] >=minvalue[2]:
        maxvalue=min(b, key=lambda x:x[2])

score 2 · Answer 1 · answered Jul 11 '19 at 12:21

seems like your looking for consecutive streak of 'online'

just iterate the list from start to finish, and remember when 'online' started, and at the next 'offline', add this streak to the result:

mainlist = [['a', 'online', 20], ['a', 'online', 22], ['a', 'offline', 26], ['a', 'online', 28], ['a', 'offline', 31], ['a', 'online', 32], ['a', 'online', 33], ['a', 'offline', 34]]

output = []
first_online = -1
for item, status, num in mainlist:
    if status == 'online':
        if first_online == -1:
            first_online = num
    elif status == 'offline':
        output.append([item, 'online', first_online, num])
        first_online = -1

print(output)

score 1 · Answer 2 · answered Jul 11 '19 at 12:19

This is one approach using iter

Ex:

mainlist=iter([['a','online',20],['a','online',22],['a','offline',26],['a','online',28],['a','offline',31],['a','online',32],['a','online',33],['a','offline',34]])

result = []
for i in mainlist:
    if i[1] == 'online':
        result.append(i)
        while True:
            i = next(mainlist)
            if i[1] == "offline":
                result[-1].append(i[-1])
                break

Output:

[['a', 'online', 20, 26], ['a', 'online', 28, 31], ['a', 'online', 32, 34]]

score 1 · Accepted Answer · answered Jul 11 '19 at 14:23

We can use itertools.groupby to group consecutive lists that have same 2nd elements, 'online' or 'offline', with the help of itertools.itemgetter, and then just collect the necessary output lists:

from itertools import groupby
from operator import itemgetter

mainlist = [['a', 'online', 20],
            ['a', 'online', 22],
            ['a', 'offline', 26],
            ['a', 'online', 28],
            ['a', 'offline', 31],
            ['a', 'online', 32],
            ['a', 'online', 33],
            ['a', 'offline', 34]]
result = []
for key, group in groupby(mainlist, key=itemgetter(1)):
    if key == 'online':
        output = min(group, key=itemgetter(2)).copy()
        # or `output = next(group).copy()` if data is always sorted
    else:
        next_offline = next(group)
        output.append(next_offline[2])
        result.append(output)
print(result)
# [['a', 'online', 20, 26], ['a', 'online', 28, 31], ['a', 'online', 32, 34]]

I find this version better than the other ones presented here as the code is not deeply nested and doesn't use "flag" variables.

Further improvements:

As Guido van Rossum said: "Tuples are for heterogeneous data, list are for homogeneous data." But right now your lists keep heterogeneous data. I suggest using namedtuple which allows to easier distinguish between the fields. I'm gonna use the typed version from typing module, but you are free to use the one from collections. For example, it could look like this:

from typing import NamedTuple


class Record(NamedTuple):
    process: str
    status: str
    time: int


class FullRecord(NamedTuple):
    process: str
    status: str
    start: int
    end: int

We can get the list of Records from your list of lists easily by using itertools.starmap:

from itertools import starmap

records = list(starmap(Record, mainlist))
# [Record(process='a', status='online', time=20),
#  Record(process='a', status='online', time=22),
#  Record(process='a', status='offline', time=26),
#  Record(process='a', status='online', time=28),
#  Record(process='a', status='offline', time=31),
#  Record(process='a', status='online', time=32),
#  Record(process='a', status='online', time=33),
#  Record(process='a', status='offline', time=34)]

and then let's wrap the first code example in a generator function, and replace some parts of it to reflect the changes in input data:

def collect_times(values):
    for key, group in groupby(values, key=Record.status.fget):
        if key == 'online':
            min_online_record = next(group)
        else:
            next_offline_record = next(group)
            yield FullRecord(process=min_online_record.process,
                             status='online',
                             start=min_online_record.time,
                             end=next_offline_record.time)


result = list(collect_times(records))
# [FullRecord(process='a', status='online', start=20, end=26),
#  FullRecord(process='a', status='online', start=28, end=31),
#  FullRecord(process='a', status='online', start=32, end=34)]

This is it, now the code looks more self-explanatory than before. We can see which field goes where, and they are referenced by names, not indices.

Note that as your data is sorted, I write min_online_record = next(group), but if it is not always the case, you should write min_online_record = min(group, key=Record.time.fget) instead.

Also, if you are interested, note that there is duplication of fields in Record and FullRecord. You could circumvent that by inheriting from a parent class with two fields process and status, but inheriting from a namedtuple is not really pretty. So, if you do that, use dataclass instead.

Collect elements from a list of lists based on the first elements of each group

3 Answers3