Pandas DataFrame.groupby() to dictionary with multiple columns for value

Question

type(Table)
pandas.core.frame.DataFrame

Table
======= ======= =======
Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3

For anyone familliar with pandas how would I build a multivalue dictionary with the .groupby() method?

I would like an output to resemble this format:

{
    0: [(23,1)]
    1: [(5,  2), (2, 3), (19, 5)]
    # etc...
    }

where Col1 values are represented as keys and the corresponding Col2 and Col3 are tuples packed into an array for each Col1 key.

My syntax works for pooling only one column into the .groupby():

Table.groupby('Column1')['Column2'].apply(list).to_dict()
# Result as expected
{
    0: [23], 
    1: [5, 2, 19], 
    2: [56, 22], 
    3: [2, 14], 
    4: [59], 
    5: [44, 1, 87]
}

However specifying multiple values for the indices results in returning column names for the value :

Table.groupby('Column1')[('Column2', 'Column3')].apply(list).to_dict()
# Result has column namespace as array value
{
    0: ['Column2', 'Column3'],
    1: ['Column2', 'Column3'],
    2: ['Column2', 'Column3'],
    3: ['Column2', 'Column3'],
    4: ['Column2', 'Column3'],
    5: ['Column2', 'Column3']
 }

How would I return a list of tuples in the value array?

score 29 · Accepted Answer · answered Feb 27 '18 at 20:23

Customize the function you use in apply so it returns a list of lists for each group:

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: g.values.tolist()).to_dict()
# {0: [[23, 1]], 
#  1: [[5, 2], [2, 3], [19, 5]], 
#  2: [[56, 1], [22, 2]], 
#  3: [[2, 4], [14, 5]], 
#  4: [[59, 1]], 
#  5: [[44, 1], [1, 2], [87, 3]]}

If you need a list of tuples explicitly, use list(map(tuple, ...)) to convert:

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
# {0: [(23, 1)], 
#  1: [(5, 2), (2, 3), (19, 5)], 
#  2: [(56, 1), (22, 2)], 
#  3: [(2, 4), (14, 5)], 
#  4: [(59, 1)], 
#  5: [(44, 1), (1, 2), (87, 3)]}

This is great, so the `apply` method is basically a `map` and `reduce` bundled into one? — Micks Ketches, Feb 27 '18 at 21:00
`apply` method is close to `map`, both simulate for loops. The `reduce` effect in this example is more due to `groupby`. Semantically, `apply` invokes the lambda function for each group. — Psidom, Feb 27 '18 at 21:03

jpp · Answer 2 · 2018-02-27T20:35:23.913

8

One way is to create a new tup column and then create the dictionary.

df['tup'] = list(zip(df['Column2'], df['Column3']))
df.groupby('Column1')['tup'].apply(list).to_dict()

# {0: [(23, 1)],
#  1: [(5, 2), (2, 3), (19, 5)],
#  2: [(56, 1), (22, 2)],
#  3: [(2, 4), (14, 5)],
#  4: [(59, 1)],
#  5: [(44, 1), (1, 2), (87, 3)]}

@Psidom's solution is more efficient, but if performance isn't an issue use what makes more sense to you:

df = pd.concat([df]*10000)

def jp(df):
    df['tup'] = list(zip(df['Column2'], df['Column3']))
    return df.groupby('Column1')['tup'].apply(list).to_dict()

def psi(df):
    return df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()

%timeit jp(df)   # 110ms
%timeit psi(df)  # 80ms

edited Feb 27 '18 at 20:35

answered Feb 27 '18 at 20:28

jpp

159,742
34
281
339

Would it possible to modify this to have dict instead of tuples: {0: {23: 1}, 1: {5: 2, 2: 3, 19: 5}, 2: {56: 1, 22: 2} } ? – user1298416 May 20 '20 at 15:31
1

@user1298416, take the output `dct` and use a comprehension: `{k: dict(v) for k, v in dct.items()}`. `dict` takes a list of tuples directly. – jpp May 20 '20 at 16:03

score 2 · Answer 3 · answered Feb 27 '18 at 20:39

I'd rather use defaultdict

from collections import defaultdict

d = defaultdict(list)

for row in df.values.tolist():
    d[row[0]].append(tuple(row[1:]))

dict(d)

{0: [(23, 1)],
 1: [(5, 2), (2, 3), (19, 5)],
 2: [(56, 1), (22, 2)],
 3: [(2, 4), (14, 5)],
 4: [(59, 1)],
 5: [(44, 1), (1, 2), (87, 3)]}

Pandas DataFrame.groupby() to dictionary with multiple columns for value

3 Answers3

Linked

Related