0

I've checked some topics about groupby() but I don't get what's wrong with my example:

students = [{'name': 'Paul',    'mail': '@gmail.com'},
            {'name': 'Tom',     'mail': '@yahoo.com'},
            {'name': 'Jim',     'mail': 'gmail.com'},
            {'name': 'Jules',   'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'},
            {'name': 'Kathrin', 'mail': '@something.com'}]

key_func = lambda student: student['mail']

for key, group in itertools.groupby(students, key=key_func):
    print(key)
    print(list(group))

This prints each student separately. Why I don't get only 3 groups: @gmail.com, @yahoo.com and @something.com?

2 Answers2

4

For starters, some of the mails are gmail.com and some are @gmail.com which is why they are treated as separate groups.

groupby also expects the data to be pre-sorted by the same key function, which explains why you get @something.com twice.

From the docs:

... Generally, the iterable needs to already be sorted on the same key function. ...

students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
            {'name': 'Jim', 'mail': 'gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]

key_func = lambda student: student['mail']

students.sort(key=key_func)
# sorting by same key function we later use with groupby

for key, group in itertools.groupby(students, key=key_func):
    print(key)
    print(list(group))

#  @gmail.com
#  [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Gregory', 'mail': '@gmail.com'}]
#  @something.com
#  [{'name': 'Jules', 'mail': '@something.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
#  @yahoo.com
#  [{'name': 'Tom', 'mail': '@yahoo.com'}]
#  gmail.com
#  [{'name': 'Jim', 'mail': 'gmail.com'}]

After fixing both sorting and gmail.com/@gmail.com we get the expected output:

import itertools

students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
            {'name': 'Jim', 'mail': '@gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]

key_func = lambda student: student['mail']

students.sort(key=key_func)

for key, group in itertools.groupby(students, key=key_func):
    print(key)
    print(list(group))

#  @gmail.com
#  [{'mail': '@gmail.com', 'name': 'Paul'},
#   {'mail': '@gmail.com', 'name': 'Jim'},
#   {'mail': '@gmail.com', 'name': 'Gregory'}]
#  @something.com
#  [{'mail': '@something.com', 'name': 'Jules'},
#   {'mail': '@something.com', 'name': 'Kathrin'}]
#  @yahoo.com
#  [{'mail': '@yahoo.com', 'name': 'Tom'}]
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • Ok, I've read documentation and I saw that sequence must be ordered but I've gotten lost because of a dictionary cannot be sorted. I've created a messy code. –  May 06 '18 at 10:41
  • @kviatek It is about ordering the *list* of dictionaries, not the dictionaries themselves. – DeepSpace May 06 '18 at 10:42
  • I'm trying to edit the question into shape to be a more suitable duplicate target; would you mind if I change Jim's email from `gmail.com` to `@gmail.com` like the others? It's not really relevant to the question I think. – Aran-Fey May 06 '18 at 10:54
  • @Aran_Fey Yes, of course, I've seen that you've done it already but I response, anyway. DeepSpace Yes, I know but like I've said I was lost among all objects and I ended up trying to sort dictionaries what, obviously, was not wat has to be done. Now everything is clear to me. –  May 06 '18 at 11:06
0

itertools uses the sort order of the data. Your list is not sorted.

So if you have ["gmail.com", "something.com", "gmail.com"] itertools will create three groups. This is different than the groupby in some functional languages (or Python pandas for that sake).

You need to sort the dict first.

import itertools

students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom',    'mail': '@yahoo.com'},
            {'name': 'Jim', 'mail': 'gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
            {'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]


 for key, group in itertools.groupby(sorted(students, key=lambda x: x["mail"]), key=lambda student: student['mail']):
     print(key)
     print(list(group))

# @gmail.com
# [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Gregory', 'mail': '@gmail.com'}]
# @something.com
# [{'name': 'Jules', 'mail': '@something.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
# @yahoo.com
#[{'name': 'Tom', 'mail': '@yahoo.com'}]
#gmail.com
# [{'name': 'Jim', 'mail': 'gmail.com'}]
The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156
  • 1
    I'm not the downvoter but: this has nothing to do with the fact that dicts are unorderable, OP is grouping-by a *list* of dictionaries. You can see in my answer why it doesn't work as they expect – DeepSpace May 06 '18 at 10:30
  • Thank you DeepSpace. Fixed. – The Unfun Cat May 06 '18 at 10:31