5

I have a DataFrame:

dat = pd.DataFrame({
    'key1' : [ 1,   1,   2,   2,   3,   3,   3,   3,   4,   4],
    'key2' : ['a', 'b', 'a', 'c', 'b', 'c', 'd', 'e', 'c', 'e'],
    'value' : [1,   2,   3,   4,   5,   6,   7,   8,   9,   10]
})

I could use list to aggregate the columns:

dat.groupby('key1')['key2'].apply(list)
## key1
## 1          [a, b]
## 2          [a, c]
## 3    [b, c, d, e]
## 4          [c, e]
## Name: key2, dtype: object

What if I wanted to obtain an aggregate grouped by key1, where each row is a dict of { key2 : value } pairs? My expected output is:

## key1
## 1          {a : 1, b : 2}
## 2          {a : 3, c : 4}
## 3    {b : 5, c : 6, d : 7, e : 8}
## 4          {c : 9, e : 10}

How can this be achieved in pandas?

One solution could be to create two lists using the function above and then combine them as dict, but maybe there is a better solution?

Tim
  • 7,075
  • 6
  • 29
  • 58

1 Answers1

2

Based on your update, you're looking for groupby + apply.

df.groupby('key1')['key2', 'value'].apply(lambda x: dict(x.values))

key1
1                    {'a': 1, 'b': 2}
2                    {'a': 3, 'c': 4}
3    {'b': 5, 'c': 6, 'd': 7, 'e': 8}
4                   {'c': 9, 'e': 10}
dtype: object
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thanks! Could you comment on why `.apply(lambda x : dict(x))` does not work? I was trying to achieve your solution step-by-step, but the error has stopped me to finding it by myself... – Tim May 28 '18 at 07:28
  • @Tim `.apply(lambda x : x)` is like saying 1 = 1, it does nothing - the output is identical to the input. You need to aggregate it somehow, that's what `dict` does there. – cs95 May 28 '18 at 07:30
  • 1
    Hard to think of an alternative – Bharath M Shetty May 28 '18 at 07:38