Let's say I have a data table with 3 columns:
Category Color Date
triangle red 2017-10-10
square yellow 2017-11-10
triangle blue 2017-02-10
circle yellow 2017-07-10
circle red 2017-09-10
I want to find out the earliest date by each category. So my desired output is:
Category Color Date
square yellow 2017-11-10
triangle blue 2017-02-10
circle yellow 2017-07-10
I've looked through a couple posts about how to do this:
Finding the min date in a Pandas DF row and create new Column
Pandas groupby category, rating, get top value from each category?
With Pandas in Python, select the highest value row for each group
and more.
A popular method is the groupby
method:
df.groupby('Category').first().reset_index()
But if I use this method, then it'll group by Category
, but it'll keep both records for triangle
since it has two different colors.
Is there a better and more efficient way to do this?