how can I resample pandas dataframe by day on period time?

Question

i have a dataframe like this:

df.head()
Out[2]: 
         price   sale_date 
0  477,000,000  1396/10/30 
1  608,700,000  1396/10/30 
2  580,000,000  1396/10/03 
3  350,000,000  1396/10/03 
4  328,000,000  1396/03/18

that it has out of bounds datetime
so then i follow below to make them as period time

df['sale_date']=df['sale_date'].str.replace('/','').astype(int)

def conv(x):
    return pd.Period(year=x // 10000,
                     month=x // 100 % 100,
                     day=x % 100, freq='D')
 
df['sale_date'] = df['sale_date'].str.replace('/','').astype(int).apply(conv)

now i want to resample them by day like below:

df.resample(freq='d', on='sale_date').sum()

but it gives me this error:

resample() got an unexpected keyword argument 'freq'

[`pandas.DataFrame.resample`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html) has no parameter `freq`, it's `rule` instead. Also, how do wanna aggregate by day if you don't have any time available? I mean, logically, the result would be their own values (per row). — Cainã Max Couto-Silva, Dec 11 '20 at 03:16
thanks for your answer. I edit my dataframe. now i have some rows that have same day. so now resampling on day is correct..@CainãMaxCouto-Silva — saeedzali, Dec 11 '20 at 06:13
it gives me this error now : Resampling from level= or on= selection with a PeriodIndex is not currently supported, use .set_index(...) to explicitly set index @CainãMaxCouto-Silva — saeedzali, Dec 11 '20 at 06:17
@saeedzali - Try `df.set_index('sale_date').resample('D')['price'].sum()`, but for me not working. — jezrael, Dec 11 '20 at 06:17
then i set index for that like below: df.set_index('sale_date') and then again type df.resample(rule='d').sum() and it gives me this error: Unable to allocate 38.9 GiB for an array with shape (5221441787,) and data type int64 @CainãMaxCouto-Silva — saeedzali, Dec 11 '20 at 06:19
@saeedzali - Can you check answer and try `df = df.groupby('sale_date')['price'].sum().reset_index()`? — jezrael, Dec 11 '20 at 06:22
@saeedzali, yeah, it seems not implemented yet, and we need to `set_index`. But I've also tried now, and it didn't work for me as well. Probably using `period[D]` with `resample` is not implemented either. — Cainã Max Couto-Silva, Dec 11 '20 at 06:22

jezrael · Accepted Answer · 2020-12-11T06:37:20.777

1

It seems here not working resample and Grouper with Periods for me in pandas 1.1.3 (I guess bug):

df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)

def conv(x):
    return pd.Period(year=x // 10000,
                     month=x // 100 % 100,
                     day=x % 100, freq='D')
 
df['sale_date'] = df['sale_date'].apply(conv)

# df = df.set_index('sale_date').resample('D')['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00

# df = df.set_index('sale_date').groupby(pd.Grouper(freq='D'))['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00

Possible solution is aggregate by sum, so if duplicated sale_date then price values are summed:

df = df.groupby('sale_date')['price'].sum().reset_index()
print (df)
    sale_date      price
0  1396-03-18  328000000
1  1396-10-03  580000000
2  1396-10-30  477000000
3  1396-11-25  608700000
4  1396-12-05  350000000

EDIT: It is possible by Series.reindex with period_range:

s = df.groupby('sale_date')['price'].sum()
rng = pd.period_range(s.index.min(), s.index.max(), name='sale_date')
df = s.reindex(rng, fill_value=0).reset_index()
print (df)
      sale_date      price
0    1396-03-18  328000000
1    1396-03-19          0
2    1396-03-20          0
3    1396-03-21          0
4    1396-03-22          0
..          ...        ...
258  1396-12-01          0
259  1396-12-02          0
260  1396-12-03          0
261  1396-12-04          0
262  1396-12-05  350000000

[263 rows x 2 columns]

edited Dec 11 '20 at 06:37

answered Dec 11 '20 at 06:13

jezrael

822,522
95
1,334
1,252

I am sorry to take your time. how can i fill gaps datetime in my dataframe – saeedzali Dec 11 '20 at 06:32
Is it possible to fill values by mean value of next and previous day? thanks for your helps @jezrael – saeedzali Dec 11 '20 at 06:46
@saeedzali - Do you think each 3 rows after `df = s.reindex(rng, fill_value=0).reset_index()`? – jezrael Dec 11 '20 at 06:47
@saeedzali - Maybe the best it should be new question, I think. – jezrael Dec 11 '20 at 06:53
i think to fill gaps after df = df.groupby('sale_date')['price'].sum().reset_index() @jezrael – saeedzali Dec 11 '20 at 06:55
@saeedzali - hmmm, not sure if understand, because if e.g. 3 or more 0 like in your sample how should working it? – jezrael Dec 11 '20 at 06:57
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225812/discussion-between-saeedzali-and-jezrael). – saeedzali Dec 11 '20 at 06:59
1

no i dont have zero values in my dataframe. i have some gaps when that day is holiday (for example) and i want to fill them by mean of next and previous day @jezrael – saeedzali Dec 11 '20 at 07:01
@saeedzali - hmmm, so is possible post new question with sample data and expected output? e.g. if use dates like `['1396/10/30', '1396/11/01', '1396/11/03', '1396/11/05'` in column `sale_date` how should looks expected output? – jezrael Dec 11 '20 at 07:06
@saeedzali - E.g use `df = pd.DataFrame({'price': ['477,000,000', '608,700,000', '580,000,000', '350,000,000'], 'sale_date': ['1396/10/30', '1396/11/01', '1396/11/03', '1396/11/07']})` – jezrael Dec 11 '20 at 07:10
thanks for your helps i post it here https://stackoverflow.com/questions/65247639/how-can-i-fill-gaps-by-mean-in-period-datetime-column-in-pandas-dataframe @jezrael – saeedzali Dec 11 '20 at 07:55
hi jezrael can you vote me up i have a question but I cannot ask it@jezrael – saeedzali Jan 04 '21 at 20:31

how can I resample pandas dataframe by day on period time?

1 Answers1

Linked