Pandas groupby issue after melt bug?

Question

Python version 3.8.12
Pandas 1.4.1

Given the following dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'id': [1000] * 4,
    'date': ['2022-01-01'] * 4,
    'ts': pd.date_range('2022-01-01', freq='5M', periods=4),
    'A': np.random.randint(1, 6, size=4),
    'B': np.random.rand(4)
})

That looks like this:

	id	date	ts	A	B
0	1000	2022-01-01	2022-01-01 00:00:00	4	0.98019
1	1000	2022-01-01	2022-01-01 00:05:00	3	0.82021
2	1000	2022-01-01	2022-01-01 00:10:00	4	0.549684
3	1000	2022-01-01	2022-01-01 00:15:00	5	0.0818311

I transposed the columns A and B with pandas melt:

melted = df.melt(
    id_vars=['id', 'date', 'ts'],
    value_vars=['A', 'B'],
    var_name='label',
    value_name='value',
    ignore_index=True
)

That looks like this:

	id	date	ts	label	value
0	1000	2022-01-01	2022-01-01 00:00:00	A	4
1	1000	2022-01-01	2022-01-01 00:05:00	A	3
2	1000	2022-01-01	2022-01-01 00:10:00	A	4
3	1000	2022-01-01	2022-01-01 00:15:00	A	5
4	1000	2022-01-01	2022-01-01 00:00:00	B	0.98019
5	1000	2022-01-01	2022-01-01 00:05:00	B	0.82021
6	1000	2022-01-01	2022-01-01 00:10:00	B	0.549684
7	1000	2022-01-01	2022-01-01 00:15:00	B	0.0818311

Then I groupby and select the first group:

melted.groupby(['id', 'date']).first()

That gives me this:

                        ts label  value
id   date                              
1000 2022-01-01 2022-01-01     A    4.0

But I would expect this output instead:

                                 ts  A         B
id   date                                       
1000 2022-01-01 2022-01-01 00:00:00  4  0.980190
     2022-01-01 2022-01-01 00:05:00  3  0.820210
     2022-01-01 2022-01-01 00:10:00  4  0.549684
     2022-01-01 2022-01-01 00:15:00  5  0.081831

What am I not getting? Or this is a bug? Also why the ts columns is converted to a date?

`Also why the ts columns is converted to a date?` - because time is `00:00:00` is not displayed `2022-01-01 00:00:00`, not converted to `date` — jezrael, May 18 '22 at 09:31

score 1 · Answer 1 · edited Mar 30 '23 at 04:30

1

I thought first will get the first group but instead it will get the first element for each group, as stated in the documentation for the aggregation functions of pandas.

To select the first group, I needed to use get_group function.

edited Mar 30 '23 at 04:30

user16217248

3,119
19
19
37

answered May 18 '22 at 23:43

Serg

121
1
9

Pandas groupby issue after melt bug?

1 Answers1