We can use itertools.groupby
to group consecutive lists that have same 2nd elements, 'online'
or 'offline'
, with the help of itertools.itemgetter
, and then just collect the necessary output lists:
from itertools import groupby
from operator import itemgetter
mainlist = [['a', 'online', 20],
['a', 'online', 22],
['a', 'offline', 26],
['a', 'online', 28],
['a', 'offline', 31],
['a', 'online', 32],
['a', 'online', 33],
['a', 'offline', 34]]
result = []
for key, group in groupby(mainlist, key=itemgetter(1)):
if key == 'online':
output = min(group, key=itemgetter(2)).copy()
# or `output = next(group).copy()` if data is always sorted
else:
next_offline = next(group)
output.append(next_offline[2])
result.append(output)
print(result)
# [['a', 'online', 20, 26], ['a', 'online', 28, 31], ['a', 'online', 32, 34]]
I find this version better than the other ones presented here as the code is not deeply nested and doesn't use "flag" variables.
Further improvements:
As Guido van Rossum said: "Tuples are for heterogeneous data, list are for homogeneous data." But right now your lists keep heterogeneous data. I suggest using namedtuple
which allows to easier distinguish between the fields. I'm gonna use the typed version from typing
module, but you are free to use the one from collections
. For example, it could look like this:
from typing import NamedTuple
class Record(NamedTuple):
process: str
status: str
time: int
class FullRecord(NamedTuple):
process: str
status: str
start: int
end: int
We can get the list of Record
s from your list of lists easily by using itertools.starmap
:
from itertools import starmap
records = list(starmap(Record, mainlist))
# [Record(process='a', status='online', time=20),
# Record(process='a', status='online', time=22),
# Record(process='a', status='offline', time=26),
# Record(process='a', status='online', time=28),
# Record(process='a', status='offline', time=31),
# Record(process='a', status='online', time=32),
# Record(process='a', status='online', time=33),
# Record(process='a', status='offline', time=34)]
and then let's wrap the first code example in a generator function, and replace some parts of it to reflect the changes in input data:
def collect_times(values):
for key, group in groupby(values, key=Record.status.fget):
if key == 'online':
min_online_record = next(group)
else:
next_offline_record = next(group)
yield FullRecord(process=min_online_record.process,
status='online',
start=min_online_record.time,
end=next_offline_record.time)
result = list(collect_times(records))
# [FullRecord(process='a', status='online', start=20, end=26),
# FullRecord(process='a', status='online', start=28, end=31),
# FullRecord(process='a', status='online', start=32, end=34)]
This is it, now the code looks more self-explanatory than before. We can see which field goes where, and they are referenced by names, not indices.
Note that as your data is sorted, I write min_online_record = next(group)
, but if it is not always the case, you should write min_online_record = min(group, key=Record.time.fget)
instead.
Also, if you are interested, note that there is duplication of fields in Record
and FullRecord
. You could circumvent that by inheriting from a parent class with two fields process
and status
, but inheriting from a namedtuple
is not really pretty. So, if you do that, use dataclass
instead.