I have sample snippet that works as expected:
import pandas as pd
df = pd.DataFrame(data={'label': ['a', 'b', 'b', 'c'], 'wave': [1, 2, 3, 4], 'y': [0,0,0,0]})
df['new'] = df.groupby(['label'])[['wave']].transform(tuple)
The result is:
label wave y new
0 a 1 0 (1,)
1 b 2 0 (2, 3)
2 b 3 0 (2, 3)
3 c 4 0 (4,)
It works analagously, if instead of tuple
in transform I give set, frozenset, dict
, but if I give list
I got completly unexpected result:
df['new'] = df.groupby(['label'])[['wave']].transform(list)
label wave y new
0 a 1 0 1
1 b 2 0 2
2 b 3 0 3
3 c 4 0 4
There is a workaround to get expected result:
df['new'] = df.groupby(['label'])[['wave']].transform(tuple)['wave'].apply(list)
label wave y new
0 a 1 0 [1]
1 b 2 0 [2, 3]
2 b 3 0 [2, 3]
3 c 4 0 [4]
I thought about mutability/immutability (list/tuple) but for set/frozenset it is consistent.
The question is why it works in this way?