0

I have a dataframe with data from two forecast cycles, one starting at 00 UTC and going up to 168 hours forecast (Forecast (valid time column) and another cycle starting at 12 UTC and also going up to 168 hours forecast. Based on dataframe that I have below, I would like to create a column called cycle that corresponds to the Date-Time column of which forecast cycle the data refers. For example:

Date-Time             Cycle
2020-07-16 00:00:00   00
2020-07-16 00:00:00   12

How can I do this?

My dataframe looks like this:

enter image description here

Array file

1 Answers1

1

IIUC, you can use pandas.Series.where with pandas.Series.ffill :

import numpy

df = pd.read_csv("df.csv", sep=";", index_col=0, usecols=[0,1,2])
​
df['Date-Time'] = pd.to_datetime(df['Date-Time'])
​
#is it the start of the cycle ?
m = df["Forecast (valid time)"].eq(0)

df["Cycle"] = df["Date-Time"].dt.hour.where(m).ffill()

Output :

print(df.groupby("Cycle").head(5))

             Date-Time  Forecast (valid time)  Cycle
0  2020-07-16 00:00:00                    0.0      0
1  2020-07-16 03:00:00                    3.0      0
2  2020-07-16 06:00:00                    6.0      0
3  2020-07-16 09:00:00                    9.0      0
4  2020-07-16 12:00:00                   12.0      0
57 2020-07-16 12:00:00                    0.0     12
58 2020-07-16 15:00:00                    3.0     12
59 2020-07-16 18:00:00                    6.0     12
60 2020-07-16 21:00:00                    9.0     12
61 2020-07-17 00:00:00                   12.0     12
Timeless
  • 22,580
  • 4
  • 12
  • 30