Pandas dataframe.resample and mean gives higher values with increasing sample width

Question

I have a time series of rain intensity (in µm/s), which I resample to 1 minute intervals. The data already has a 1 minute time step, but I may have data outage due to quality checks or basic equipment failure. The resample ensures that I have a consistent, equidistant time series to loop over, which is fastest for me so far.

The problem is that in theory I can choose another time step for the calculation, say 5 minutes. I have found that this gives larger dimensions for a rainwater basin, which was odd to me. I figured out that it is because the sum of the resample systematically gives higher values, i.e. more precipitation -> larger basin.

How is it that resample gives this odd result? Is it because it can take the same time steps and account for them in different resampled time steps...?

File is uploaded here

import pandas
import numpy
import datetime
import matplotlib
from matplotlib import pyplot as plt

data1 = pandas.read_csv("rain_1min.txt", sep=";", parse_dates=["time"], index_col="time")

test = list(range(1,121))
sums = []
for timestep in test:
    data_rs = data1["rain"].resample(f"{timestep}Min").mean().replace("nan", 0.0)
    sums.append(numpy.nansum(data_rs))

fig, ax = plt.subplots(figsize=[8,4], dpi=100)
ax.plot(test, sums)
ax.set_xlabel("Rule = x Min")
ax.set_ylabel("Sum of mean()")

I just realised that the sum of the means of course goes down, because there are fewer points to sum. The amount of water is proportional to the sum × time step, which increases, not the sums alone... — karga, Oct 07 '22 at 16:12

Pandas dataframe.resample and mean gives higher values with increasing sample width

0 Answers0