-1

For:

df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)})

Why is only this working

df['b_new'] = df.a.map(df.groupby('a').b.nth(-1))

but not:

>>df['b_new'] = df.a.map(df.groupby('a').nth(-1))
...
TypeError: 'DataFrame' object is not callable

although both:

>>df.groupby('a').b.nth(-1)

    b
a    
1  21
2  30
3  41

and

df.groupby('a').nth(-1)

-
    b
a    
1  21
2  30
3  41

do deliver quite similar results.

(see also: https://stackoverflow.com/a/47924467/7450524

cs95
  • 379,657
  • 97
  • 704
  • 746

2 Answers2

3

If you want to understand why my answer works, then this is why.

Consider -

df.groupby('a').nth(-1)

    b
a    
1  21
2  30
3  41

nth is applied to each column of each group, resulting in a dataframe. In your case, there is only one column.

However, in this case -

df.groupby('a').b.nth(-1)

a
1    21
2    30
3    41
Name: b, dtype: int64

nth is applied only to b, so the result is a series.

Now, take a look at the docs for map, in particular, what you can pass to it -

arg : function, dict, or Series

A callable, dict, or pd.Series object. You cannot pass a dataframe! What map does, is it uses the index of the series as the indexer into the series you call map on, and replaces it the corresponding value for that index.

cs95
  • 379,657
  • 97
  • 704
  • 746
1

There is difference - if not specify column it return DataFrame:

print (df.groupby('a').nth(-1))
    b
a    
1  21
2  30
3  41

And with specify return Series:

print (df.groupby('a').b.nth(-1))
a
1    21
2    30
3    41
Name: b, dtype: int64

Error means map working with Series, not with DataFrame, although it is only one column df.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252