0

I am trying to resample a time series to get annual maximum values for different time steps(eg., 3h, 6h, etc. The original series is at an hourly resolution. I first converted the date format to pandas date format, used that column as an index, and resampled it. The final output should be the years and the corresponding maximum values at the desired timestep. However, i am getting a list of NaN. I am not sure, how can I incorporate a range in my code. Here is my code so far for a 3H timestep

import pandas as pd
df = pd.read_csv('data.txt', delimiter = ";")
df = pd.DataFrame(df[['yyyymmddhh', 'rainfall']])
datin["yyyymmddhh"] = pd.to_datetime(datin["yyyymmddhh"], format="%Y%M%d%H")
datin.set_index("yyyymmddhh").resample("3H").sum().resample("Y").max()

stn_n;yyyymmddhh;rainfall
xyz;1980123123;-
xyz;1981010100;0.0
xyz;1981010101;0.0
xyz;1981010102;0.0
xyz;1981010103;0.0
xyz;1981010104;0.0
xyz;1981010105;0.0
xyz;1981010106;0.0
xyz;1981010107;0.0
xyz;1981010108;0.0
xyz;1981010109;0.4
xyz;1981010110;0.6
xyz;1981010111;0.1
xyz;1981010112;0.1
xyz;1981010113;0.0
xyz;1981010114;0.1
xyz;1981010115;0.6
Dawar
  • 69
  • 8
  • Replace `format="%Y%M%d%H"` with `format="%Y%m%d%H"`. See [here](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) for format codes. – AlexK Apr 30 '21 at 06:03
  • Many thanks for the link. I tried but i got the output like this yyyymmddhh rainfall 1981-12-31 8.01.13.1 – Dawar Apr 30 '21 at 06:11
  • Not sure how you are getting those 8 and 13 numbers. You should also pass `-` to the `na_values=` parameter in `pd.read_csv()`, since you have a dash in the first row so Pandas recognizes it as NaN. And define data types for each column with the `dtype=` parameter (e.g., `dtype={'yyyymmddhh': str, 'rainfall': 'float32'}`). [Documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) – AlexK Apr 30 '21 at 06:17
  • How do i incorporate a range in my resampling algorithm. I tried df['1981-12-31': '2000-12-31'] but it returned error messages – Dawar Apr 30 '21 at 06:21
  • Are you trying to run the code on part of your data? You can add a mask: `df[(df['yyyymmddhh'] >= pd.Timestamp('1981-12-31')) & (df['yyyymmddhh'] <= pd.Timestamp('2000-12-31'))] – AlexK Apr 30 '21 at 06:26
  • 1
    Yes, i am trying to discard the initial and the last year. Many thanks! This worked smoothly – Dawar Apr 30 '21 at 06:36

0 Answers0