1

I have a dataframe with irregular timestamps in seconds that spans over multiple days, I would like to create a new column and bucket these entries in minute buckets and have an increasing counter in a separate column. So all values that are within one minute should get the same counter value which increases with the number of minutes per day, on a new day the counter should start from 1 again.

                       Value    Counter
2020-01-01 10:00:00      7.       1
2020-01-01 10:00:05      45.       1
2020-01-01 10:00:10      25.       1
2020-01-01 10:02:00      85.       2
2020-01-02 07:00:00      51.       1
2020-01-02 10:00:00      52.       2

I thought about sth like this

df['Counter'] = df.groupby([df.index.dt.day, df.index.dt.minute]).count()

Which does not seem to work.

ThatQuantDude
  • 759
  • 1
  • 9
  • 26

2 Answers2

0

I hope this will help you.

# Reset the index to make the timestamp a column
df.reset_index(inplace=True)

# Group the entries into minute intervals and assign a counter
df['Counter'] = df.groupby(pd.Grouper(key='index', freq='Min')).cumcount() + 1

The output I got

                index  Value  Counter
0 2020-01-01 10:00:00      7        1
1 2020-01-01 10:00:05     45        2
2 2020-01-01 10:00:10     25        3
3 2020-01-01 10:02:00     85        1
4 2020-01-02 07:00:00     51        1
5 2020-01-02 10:00:00     52        1

I am not sure what exactly you want to achieve.

Dejene T.
  • 973
  • 8
  • 14
  • would this give me the same counter value for all values within the same minute slice ? – ThatQuantDude Jun 29 '23 at 13:28
  • @ThatQuantDude I just added the output of the above code. Check it again. – Dejene T. Jun 29 '23 at 14:36
  • Index 0,1,2 should all have 1 as Counter, they are all within the same minute, then index 3 should be 2. and then it starts again for the next day. The individual minutes per day should have individual counters but not the entries within the same minute. hope that makes it clearer – ThatQuantDude Jun 29 '23 at 14:48
0

You can try .groupby inside .groupby + "ngroup":

df.index = pd.to_datetime(df.index)

df["Counter"] = df.groupby(df.index.date, group_keys=False).apply(
    lambda x: x.groupby([x.index.hour, x.index.minute]).transform("ngroup") + 1
)
print(df)

Prints:

                     Value  Counter
2020-01-01 10:00:00      7        1
2020-01-01 10:00:05     45        1
2020-01-01 10:00:10     25        1
2020-01-01 10:02:00     85        2
2020-01-02 07:00:00     51        1
2020-01-02 10:00:00     52        2
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91