pandas groupby where you get the max of one column and the min of another column

Question

I have a dataframe as follows:

user    num1    num2
a       1       1
a       2       2
a       3       3
b       4       4
b       5       5

I want a dataframe which has the minimum from num1 for each user, and the maximum of num2 for each user.

The output should be like:

user    num1    num2
a       1       3
b       4       5

I know that if I wanted the max of both columns I could just do:

a.groupby('user')['num1', 'num2'].max()

Is there some equivalent without having to do something like:

series_1 = a.groupby('user')['num1'].min() 
series_2 = a.groupby('user')['num2'].max()

# converting from series to df so I can do a join on user
df_1 = pd.DataFrame(np.array([series_1]).transpose(), index=series_1.index, columns=['num1']) 
df_2 = pd.DataFrame(np.array([series_2]).transpose(), index=series_2.index, columns=['num2'])

df_1.join(df_2)

jezrael · Accepted Answer · 2017-06-06T06:56:34.277

27

Use groupby + agg by dict, so then is necessary order columns by subset or reindex_axis. Last add reset_index for convert index to column if necessary.

df = a.groupby('user').agg({'num1':'min', 'num2':'max'})[['num1','num2']].reset_index()
print (df)
  user  num1  num2
0    a     1     3
1    b     4     5

What is same as:

df = a.groupby('user').agg({'num1':'min', 'num2':'max'})
                      .reindex_axis(['num1','num2'], axis=1)
                      .reset_index()
print (df)
  user  num1  num2
0    a     1     3
1    b     4     5

edited Jun 06 '17 at 06:56

answered Jun 06 '17 at 06:51

jezrael

822,522
95
1,334
1,252

4

I think `a.groupby('user').agg({'num1': min, 'num2': max}).reset_index()` is enough (without the `[['num1', 'num2']]`), as `.agg()` should already filter by the specified columns in the dictionary. – Peque Nov 21 '17 at 13:06
1

@Peque - You are right, I test it with `df = df.groupby('user').agg({'num2':'min', 'num1':'max'}).reset_index()` – jezrael Nov 21 '17 at 13:13

score 3 · Answer 2 · answered May 25 '21 at 06:41

3

I would like to add to @jezrael's answer if anyone wants to get the first and last values for specific columns, it can be done in the same way as:

df.groupby(['user']).agg({'num1':'min', 'num2':'max', 'num3':'first', 'num4':'last', 'num5':'sum'})

answered May 25 '21 at 06:41

Mihir Thakur

94
1
8

pandas groupby where you get the max of one column and the min of another column

2 Answers2

Linked

Related