Is it possible to return the entirety of data not just part of which we are grouping by?
I mean for example - I have a dataframe with 5 columns and one of those columns contains distance
, the other one is timestamp
and the last important one is name
. I grouped dataframe by timestamp
- agg function I applied is (min) on distance
. As a return i get correctly grouped dataframe with timestamp and distance - how can i add columns name
there. If I group it by name
as well then timestamp
becomes duplicated - it has to stay unique. As a final result I need to get dataframe like this:
timestamp | name | distance |
---|---|---|
2020-03-03 15:30:235 | Billy | 123 |
2020-03-03 15:30:435 | Johny | 111 |
But instead i get this:
timestamp | distance |
---|---|
2020-03-03 15:30:235 | 123 |
2020-03-03 15:30:435 | 111 |
Whole table has more than 700k rows so joining it back on distance
gives me that amount of rows which my PC can't even handle.
Here is my groupby
which gives me 2nd table:
grouped_df = df1.groupby('timestamp')['distance'].min()
Here is what i tried to do in order to get name
inside the table:
grouped_df.merge(df1, how='left', left_on=['timestamp','distance'],
right_on = ['timestamp','distance'])