0

I have a dataframe with timestamp values. I have been able to figure out how to use resample method of dataframe and applying function last() or mean() to results. I am doing it as follows :

print(type(df.timestamp))
print(type(df.timestamp[0]))
df=df.set_index('timestamp')
df_1=df.resample('60S').last()
df_2=df.resample('60S').mean()

<class 'pandas.core.series.Series'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Here, df_1 gives me result where values are last values in each resampling period and df_2 gives me result where values are mean of all values in each resampling period.

Now the issue is my data contains lots of zero values, so I want result of df_1 to contain last non-zero value and result of df_2 to contain mean of only those values which are non-zeros. I have not been able to figure out method to do so in the documentation(resampling link).

Please suggest appropriate way to achieve this.

lonstud
  • 525
  • 2
  • 19

1 Answers1

1

zero can be replaced with np.nan and then functions can be applied.

df=pd.DataFrame({
    'timestamp':pd.date_range('2020.01.01', periods=6, freq='30S'),
    'val':[1,2,3,0,0,4]
})
df=df.set_index('timestamp')
df.val = df.val.replace(0, np.nan)
df = df.resample('60s').agg(['mean','last'])
df

Output

                      val
                      mean     last
timestamp       
2020-01-01 00:00:00   1.5       2.0
2020-01-01 00:01:00   3.0       3.0 
2020-01-01 00:02:00   4.0       4.0
Utsav
  • 5,572
  • 2
  • 29
  • 43