0

I am grouping string elements within a list by the first word and by the last word in the string. I am using groupby from itertools to do the grouping. The process seems to work fine for the last word, however it doesn't seem to do the same for the first one.

from itertools import groupby

model_eval_cols = ['MAD model meFuelFlowStar', 'MedAD model meFuelFlowStar', 'MAD model rpmStar', 'MedAD model rpmStar']

for k, v in groupby(model_eval_cols, key=lambda x: x.split(' ')[2]):
    print(k, list(v))

The above outputs

meFuelFlowStar ['MAD model meFuelFlowStar', 'MedAD model meFuelFlowStar']
rpmStar ['MAD model rpmStar', 'MedAD model rpmStar']

However if I try to get the strings grouped by the first word:

for k, v in groupby(model_eval_cols, key=lambda x: x.split(' ')[0]):
    print(k, list(v))

It doesn't seem to work

MAD ['MAD model meFuelFlowStar']
MedAD ['MedAD model meFuelFlowStar']
MAD ['MAD model rpmStar']
MedAD ['MedAD model rpmStar']

This surprises me as the keys are the same

closlas
  • 136
  • 8
  • The documentation of [`groupby`](https://docs.python.org/3/library/itertools.html#itertools.groupby) says: "Generally, the iterable needs to already be sorted on the same key function." – Matthias Oct 02 '18 at 08:25

1 Answers1

5

groupby assumes that the elements in the same group appear consecutively (i.e. basically the list is sorted). Right at the beginning of groupby's documentation it says:

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

Your list is not sorted. So sort it before groupby. Define a key function to be used by sorted and groupby:

def first_word(sentence):
    return sentence.split()[0]

And then:

groupby(sorted(meFuelFlowStar, key=first_word), key=first_word)
Giacomo Alzetta
  • 2,431
  • 6
  • 17