I'd like to find, not the single largest index, but the n largest indexes for a pandas groupby.agg operation.
My data frame looks like so:
>>> dat = [{"date": datetime.date(2021,1,1), "category": "cats", "var": 1},
{"date": datetime.date(2021,1,1), "category": "dogs", "var": 0},
{"date": datetime.date(2021,2,1), "category": "cats", "var": -1},
{"date": datetime.date(2021,2,1), "category": "dogs", "var": 3},
]
>>> df = pd.DataFrame(dat)
>>> df.groupby(["date", "category"]).agg({'var': ['sum']})
var
sum
date category
2021-01-01 cats 1
dogs 0
2021-02-01 cats -1
dogs 3
Get the index where the value of var is largest:
>>> df.groupby(["date", "category"]).agg({'var': ['sum']}).idxmax()
var sum (2021-02-01, dogs)
dtype: object
Confirmed, that is correct. I can see that the index, (2021-02-01, dogs)
where the value of var is a a max.
So now, I'd like to see not just the largest index, but the nlargest indexes. Here, n=2.
However, the obvious solution to this, nlargest(2, variable_name)
seems to fail:
df.groupby(["date", "category"]).agg({'var': ['sum']}).nlargest(2, 'sum')
fails, in what seems to be a multindex failure:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'sum'
How can I get the idxmax
for the n largest indexes? I assume there is a multiindex I can pass to nlargest
or a level
, but I don't see it.