Inconsistent output for pandas groupby-resample with missing values in first time bin

Question

I am finding an inconsistent output with pandas groupby-resample behavior.

Take this dataframe, in which category A has samples on the first and second day and category B has a sample only on the second day:

df1 = pd.DataFrame(index=pd.DatetimeIndex(
    ['2022-1-1 1:00','2022-1-2 1:00','2022-1-2 1:00']),
    data={'category':['A','A','B']})

# Output:
#                    category
#2022-01-01 01:00:00        A
#2022-01-02 01:00:00        A
#2022-01-02 01:00:00        B

When I groupby-resample I get a Series with multiindex on category and time:

res1 = df1.groupby('category').resample('1D').size()

#Output: 
#category            
#A         2022-01-01    1
#          2022-01-02    1
#B         2022-01-02    1
#dtype: int64

But if I add one more data point so that B has a sample on day 1, the return value is a dataframe with single-index in category and columns corresponding to the time bins:

df2 = pd.DataFrame(index=pd.DatetimeIndex(
    ['2022-1-1 1:00','2022-1-2 1:00','2022-1-2 1:00','2022-1-1 1:00']),
    data={'category':['A','A','B','B']})

res2 = df2.groupby('category').resample('1D').size()

# Output:
#          2022-01-01  2022-01-02
# category                        
# A                  1           1
# B                  1           1

Is this expected behavior? I reproduced this behavior in pandas 1.4.2 and was unable to find a bug report.

score 0 · Answer 1 · answered Apr 21 '22 at 20:03

0

I submitted bug report 46826 to pandas.

answered Apr 21 '22 at 20:03

Jen

146
6

score 0 · Answer 2 · answered Jan 04 '23 at 18:52

The result should be a Series with a MultiIndex in both cases. There was a bug which caused df.groupby.resample.size to return a wide DF for cases in which all groups had the same index. This has been fixed on the master branch. Thank you for opening the issue.

Inconsistent output for pandas groupby-resample with missing values in first time bin

2 Answers2