1

First create dataframe with regular index, this is the df that I want to resample using th index of df1

df0 = pd.DataFrame(index=pd.date_range(start='2018-10-31 00:17:24', periods=50,freq='1s'))

I didn't know how to create a df that has an irregular index so I have created a new dataframe( the index of which I want to use) to resample df0

df1 = pd.DataFrame(index=pd.date_range(start='2018-10-31 00:17:24', periods=50,freq='20s'))

For minimum reproducible example. Create a column with values between 0 and 1

df0['dat'] = np.random.rand(len(df0))

I want to find the rows where the dat column has a value greater than 0.5

df0['target'] = 0
df0.loc[(df0['dat'] >= 0.5), 'target'] = 1

I then want to reindex df0 using the index of df1 but each row of the column named df0['target'] Should have the sum of the values that lay in that window

What I have tried is:

new_index     = df1.index 
df_new        = df0.reindex(df0.index.union(new_index)).interpolate(method='linear').reindex(new_index).sum() 

But this sum() screws everything

jokerp
  • 157
  • 1
  • 8

1 Answers1

0

IIUC:

try:

df_new=df0.reindex(df1.index.union(df0.index)).interpolate(method='linear').reset_index()

Finally make use of pd.Grouper() and groupby():

out=df_new.groupby(pd.Grouper(key='index',freq='1 min')).sum()
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • Thanks! What happens though if the index of df1 is irregular. Is there a way to do the second part of your answer without a specific freq ='1min' ? – jokerp Jun 21 '21 at 09:58
  • @jokerp you can adjust frequency accordingly to your need.....since dates are irregular so you have provide something on which they can be grouped on......you can groupby on seconds `df_new.groupby(df_new['index'].dt.second).sum()` and on minutes `df_new.groupby(df_new['index'].dt.minute).sum()` – Anurag Dabas Jun 21 '21 at 10:58
  • I am sorry but your answer does not seem to work! when i use the first line of code you wrote i get a new dataframe where the target column has the same values as before with the index of the second dataframe. However I want to have the index of the second and the values filled in the columns to be the sum – jokerp Jun 21 '21 at 11:27
  • @jokerp the first line is taken from your code...it's the last line of your code but the difference is that you chained `sum()` method and I used `reset_index()` method....btw updated answer....removed second reindex() method.....now check ***:)*** – Anurag Dabas Jun 21 '21 at 12:11