Aggregation of time-series data on multiple columns

Question

                     rand_val  new_val           copy_time
2020-10-15 00:00:00         7       26 2020-10-15 00:00:00
2020-10-15 00:00:10         8       29 2020-10-15 00:00:10
2020-10-15 00:00:20         1       53 2020-10-15 00:00:20
2020-10-15 00:03:50         6       69 2020-10-15 00:03:50
2020-10-15 00:04:00         3       19 2020-10-15 00:04:00

I am using the method resample to downsample the time-series. I have found I cannot call the specific columns when applying a function on the aggregated data.

Let's say I want to do some operation that involves calling the name of a column:

df.resample("1min").apply(lambda x: sum(x.rand_val) if len(x)>1 else 0)

I get an error:

AttributeError: 'Series' object has no attribute 'rand_val'

This would be possible if I had done groupby on some other variable. I guess the resample function is not the same. Any ideas?

try `df.resample('1min',on='copy_time').apply(...)` – Joe Ferndz Dec 23 '20 at 18:22 — Joe Ferndz, Dec 23 '20 at 18:22

score 1 · Accepted Answer · answered Dec 23 '20 at 18:34

That's a good question!. When we do groupby certain columns, each chunk of data is treated as a pandas DataFrame. So, we can access a column like how we normally do. But, in this case of resample, it's a series.

One way of obtaining only for rand_val would be to pass that series directly as follows:

df.resample("1min")['rand_val'].apply(lambda x: sum(x) if len(x)>1 else 0)

I'm assuming your index is in date-time format. Else please convert it using pd.to_datetime as follows:

df.index=pd.to_datetime(df.index)

Joe Ferndz · Answer 2 · 2020-12-23T18:31:26.353

With on=copy_time, i got the following output.

a = df.resample('1min',on='copy_time').apply(lambda x: sum(x.rand_val) if len(x)>1 else 0)
print (a)

resample is looking for an object that must have a datetime-like index. In your example, I didn't see that. Passing copy_time would give that datatime series to process.

             org_time  rand_val  new_val           copy_time
0 2020-10-15 00:00:00         7       26 2020-10-15 00:00:00
1 2020-10-15 00:00:10         8       29 2020-10-15 00:00:10
2 2020-10-15 00:00:20         1       53 2020-10-15 00:00:20
3 2020-10-15 00:03:50         6       69 2020-10-15 00:03:50
4 2020-10-15 00:04:00         3       19 2020-10-15 00:04:00


copy_time
2020-10-15 00:00:00    16
2020-10-15 00:01:00     0
2020-10-15 00:02:00     0
2020-10-15 00:03:00     0
2020-10-15 00:04:00     0
Freq: T, dtype: int64

My index is of datetime object. By default it looks to down-sample on the index. — Borut Flis, Dec 24 '20 at 14:07

Aggregation of time-series data on multiple columns

2 Answers2