0

I doing code about dataframe in python. I successfully group the timestamp with seconds like below, but I don't know how to group the timestamp with minutes.

                          Price
timestamp   
2018-06-01 00:00:00.155449  13530.909091
2018-06-01 00:00:01.155449  13530.909091
2018-06-01 00:00:02.155451  13530.909091
2018-06-01 00:00:03.155452  13530.909091
2018-06-01 00:00:04.155453  13530.909091
... ...
2018-06-01 23:59:55.735402  13285.000000
2018-06-01 23:59:56.894110  13285.000000
2018-06-01 23:59:57.894110  13285.000000
2018-06-01 23:59:58.894110  13285.000000
2018-06-01 23:59:59.894110  13285.000000

I use groupby method kind like this sell_price = sell.groupby('timestamp').price.mean()

How can I group this timestamp every minutes?

my expected result:

timestamp                      price
    2018-06-01 00:01:00.155449  13530.909091
    2018-06-01 00:02:00.155449  13530.909091
    2018-06-01 00:03:00.155451  13530.909091
    2018-06-01 00:04:00.155452  13530.909091
    2018-06-01 00:05:00.155453  13530.909091
    ... ...
    2018-06-01 23:55:00.735402  13285.000000
    2018-06-01 23:56:00.894110  13285.000000
    2018-06-01 23:57:00.894110  13285.000000
    2018-06-01 23:58:00.894110  13285.000000
    2018-06-01 23:59:00.894110  13285.000000
  • Can you post what you have done so far for the seconds group by? – stark Nov 05 '21 at 00:59
  • I use groupby method ( I edit my post) –  Nov 05 '21 at 01:01
  • Is your timestamp field a `string` or `datetime` object? if it's datetime maybe you can try doing groupby on `timestamp.minute` ? – stark Nov 05 '21 at 01:09
  • Oh.. compiler say my timestamps are string object. is there a method to change the timestamp to datetime object in csv file? –  Nov 05 '21 at 01:17
  • https://stackoverflow.com/questions/41783003/how-do-i-convert-timestamp-to-datetime-date-in-pandas-dataframe, Just apply this after reading the dataframe? – stark Nov 05 '21 at 01:19
  • 3
    `df.groupby(pd.Grouper(freq='1T').mean()` or `df.resample('1T').mean()`? – Quang Hoang Nov 05 '21 at 01:53

1 Answers1

0

Pandas has some functions to deal with time series that are easier to implement when the time stamps are the index. What you are trying to do is to downsample your data.

Downsampling is to resample a time-series dataset to a wider time frame. For example, from minutes to hours, from days to years. The result will have a reduced number of rows and values can be aggregated with mean(), min(), max(), sum() etc. Pandas resample() tricks you should know for manipulating time-series data

Here is the code to perform the task.

import pandas as pd
import numpy as np

# Set a random seed to allow reproducibility
np.random.seed = 42

# Create array with timestamps
dates = pd.date_range(start="2018-06-01 00:00:00",
                      end="2018-06-01 23:59:59",
                      freq="s")

# Create array with random prices
prices = np.random.uniform(low=13285, high=13530.9, size=len(dates))

# Create the DataFrame
df = pd.DataFrame(data=prices, index=dates)

# Resample to every minute
df.resample("1T").mean()

And here is the output:

2018-06-01 00:00:00  13421.290908
2018-06-01 00:01:00  13414.707903
2018-06-01 00:02:00  13394.962477
2018-06-01 00:03:00  13413.036905
2018-06-01 00:04:00  13412.717874
                          ...
2018-06-01 23:55:00  13412.137577
2018-06-01 23:56:00  13409.450838
2018-06-01 23:57:00  13411.499249
2018-06-01 23:58:00  13398.442782
2018-06-01 23:59:00  13412.034963

[1440 rows x 1 columns]

The original data contained 86400 rows, whereas the downsampled data contains only 1440.