I don't understand which functions are acceptable for groupby
+ transform
operations. Often, I end up just guessing, testing, reverting until something works, but I feel there should be a systematic way of determining whether a solution will work.
Here's a minimal example. First let's use groupby
+ apply
with set
:
df = pd.DataFrame({'a': [1,2,3,1,2,3,3], 'b':[1,2,3,1,2,3,3], 'type':[1,0,1,0,1,0,1]})
g = df.groupby(['a', 'b'])['type'].apply(set)
print(g)
a b
1 1 {0, 1}
2 2 {0, 1}
3 3 {0, 1}
This works fine, but I want the resulting set
calculated groupwise in a new column of the original dataframe. So I try and use transform
:
df['g'] = df.groupby(['a', 'b'])['type'].transform(set)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
---> 23 df['g'] = df.groupby(['a', 'b'])['type'].transform(set)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'
This is the error I see in Pandas v0.19.0. In v0.23.0, I see TypeError: 'set' type is unordered
. Of course, I can map a specifically defined index to achieve my result:
g = df.groupby(['a', 'b'])['type'].apply(set)
df['g'] = df.set_index(['a', 'b']).index.map(g.get)
print(df)
a b type g
0 1 1 1 {0, 1}
1 2 2 0 {0, 1}
2 3 3 1 {0, 1}
3 1 1 0 {0, 1}
4 2 2 1 {0, 1}
5 3 3 0 {0, 1}
6 3 3 1 {0, 1}
But I thought the benefit of transform
was to avoid such an explicit mapping. Where did I go wrong?