Pandas resampling method

Question

I am trying to resample a time series to get annual maximum values for different time steps(eg., 3h, 6h, etc. The original series is at an hourly resolution. I first converted the date format to pandas date format, used that column as an index, and resampled it. The final output should be the years and the corresponding maximum values at the desired timestep. However, i am getting a list of NaN. I am not sure, how can I incorporate a range in my code. Here is my code so far for a 3H timestep

import pandas as pd
df = pd.read_csv('data.txt', delimiter = ";")
df = pd.DataFrame(df[['yyyymmddhh', 'rainfall']])
datin["yyyymmddhh"] = pd.to_datetime(datin["yyyymmddhh"], format="%Y%M%d%H")
datin.set_index("yyyymmddhh").resample("3H").sum().resample("Y").max()

stn_n;yyyymmddhh;rainfall
xyz;1980123123;-
xyz;1981010100;0.0
xyz;1981010101;0.0
xyz;1981010102;0.0
xyz;1981010103;0.0
xyz;1981010104;0.0
xyz;1981010105;0.0
xyz;1981010106;0.0
xyz;1981010107;0.0
xyz;1981010108;0.0
xyz;1981010109;0.4
xyz;1981010110;0.6
xyz;1981010111;0.1
xyz;1981010112;0.1
xyz;1981010113;0.0
xyz;1981010114;0.1
xyz;1981010115;0.6

Replace `format="%Y%M%d%H"` with `format="%Y%m%d%H"`. See [here](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) for format codes. — AlexK, Apr 30 '21 at 06:03
Many thanks for the link. I tried but i got the output like this yyyymmddhh rainfall 1981-12-31 8.01.13.1 — Dawar, Apr 30 '21 at 06:11
Not sure how you are getting those 8 and 13 numbers. You should also pass `-` to the `na_values=` parameter in `pd.read_csv()`, since you have a dash in the first row so Pandas recognizes it as NaN. And define data types for each column with the `dtype=` parameter (e.g., `dtype={'yyyymmddhh': str, 'rainfall': 'float32'}`). [Documentation](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) — AlexK, Apr 30 '21 at 06:17
How do i incorporate a range in my resampling algorithm. I tried df['1981-12-31': '2000-12-31'] but it returned error messages — Dawar, Apr 30 '21 at 06:21
Are you trying to run the code on part of your data? You can add a mask: `df[(df['yyyymmddhh'] >= pd.Timestamp('1981-12-31')) & (df['yyyymmddhh'] <= pd.Timestamp('2000-12-31'))] — AlexK, Apr 30 '21 at 06:26
Yes, i am trying to discard the initial and the last year. Many thanks! This worked smoothly — Dawar, Apr 30 '21 at 06:36

Pandas resampling method

0 Answers0