-1

I have a df of minutely prices and want to establish if there are minutes missing (across a 5 year period). The price is only stamped when there is a transaction so there are some missing minutes.

There are 4 entities in a different column and I would like to know the entity that is missing the minute as well as when it was.

My first inclination is to resample and sum NaNs. What is the best way of doing this?

kedoink
  • 81
  • 1
  • 7
  • what would the missing value be? 0? nan? just use np.argwhere – Derek Eden Oct 31 '21 at 01:18
  • The missing would be a nan, good point, I have edited the question to include the fact the df is stamped only when there is a transaction so some minutes are missing from the df – kedoink Oct 31 '21 at 02:11
  • if you have a list of times, just make another list of the full set of times and check which of those are in your original list, the ones that arent are the missing times – Derek Eden Nov 01 '21 at 01:09
  • Thanks Derek that is what I ended up doing, was hoping there might be a faster way but see the answer below and please let me know if I should edit it in any :) – kedoink Nov 01 '21 at 01:10

1 Answers1

0

Until there is a better answer here is how I have dealt with this. Merge with the nearest minute using pandas

Write the answer from this question out with the addition of printing all NaN values.

df_time = pd.DataFrame({'date':pd.date_range(start='yyyy/mm/dd',end='yyyy/mm/dd', freq='1T')}) df_time.info() this with simple division will confirm you have the right data size

df_combined = pd.merge(df_time, df_price, on='date') print(df_combined.isna())

I then wanted to have the same price as the previous minute as no transactions of significant difference have occured, I did this through df_combined.ffill()

Dharman
  • 30,962
  • 25
  • 85
  • 135
kedoink
  • 81
  • 1
  • 7