3

I have found plenty of information related to moving averages when the data is sampled to regular intervals (ie 1 min, 5 mins, etc). However, I need a solution for a time series dataset that has irregular time intervals.

The dataset contains two columns, Timestamp and Price. Timestamp goes down to the millisecond, and there is no set interval for rows. I need to take my dataframe and add three moving average columns:

1 min 5 min 10 min

I do not want to resample the data, I want the end result to be the same number of rows but with the three columns filled as applicable. (IE, NaN until the 1/5/10 min interval for each column, respectively)

I feel like I am getting close, but cannot figure out how to pass the moving average variable to this function:

import pandas as pd
import numpy as np

# Load IBM data from CSV
df = pd.read_csv(
    "C:/Documents/Python Scripts/MA.csv", names=['Timestamp',
                                                            'Price'])
# Create three moving average signals
df['Timestamp'] = pd.to_datetime(df['Timestamp'], errors='coerce')
df.set_index('Timestamp', inplace=True)


def movingaverage(values, window):
    weights = np.repeat(1.0, window)/window
    smas = np.convolve(values, weights, 'valid')
    return smas


MA_1M = movingaverage(df, 1)
MA_5M = movingaverage(df, 5)
MA_10M = movingaverage(df, 10)

print(MA_1M)

Example Data:

Timestamp               Price
2018-10-08 04:00:00.013 152.59
2018-10-08 04:00:00.223 156.34
2018-10-08 04:01:00.000 152.73
2018-10-08 04:05:00.127 156.34
2018-10-08 04:10:00.000 152.73

Expected Output:

Timestamp               Price  MA_1M  MA_5M  MA10M
2018-10-08 04:00:00.013 152.59 N/A    N/A    N/A  
2018-10-08 04:00:00.223 156.34 N/A    N/A    N/A 
2018-10-08 04:01:00.000 154.73 154.55 N/A    N/A 
2018-10-08 04:05:00.127 155.34 155.34 155.47 N/A 
2018-10-08 04:10:00.000 153.73 153.73 154.54 154.55

At each row, the MA column takes that timestamp and looks back 1, 5 or 10 minutes and calculates the average. The thing that makes this difficult is that the rows can be generated at any millisecond. In my code above I am simply trying to get a moving average to work with a time variable. I am assuming that as long as the row counts match I can use the logic to add a column to my df.

Fungui
  • 49
  • 4
  • 1
    Could you include a sample of your dataframe, as well as the expected output for that sample dataframe? – rahlf23 Oct 09 '18 at 20:35
  • @rahlf23 I have edited my post to include the requested information. – Fungui Oct 09 '18 at 21:41
  • The example doesn't seem to match the description: `04:00:00.023` is not within five minutes of `04:05:00.127` but is included in the five-minute average nonetheless. – fuglede Oct 09 '18 at 21:56
  • Oops, I put that together by hand. I changed the milliseconds to be within 5 minutes. – Fungui Oct 09 '18 at 22:00
  • What about using a sliding window over *time* instead of over *observations*, and using `1 / diff(Timestamp)` to weight the average? https://stackoverflow.com/q/14631139/2954547 – shadowtalker Oct 11 '18 at 01:13
  • Were you able to get this done without resampling to every minute? For your last row's MA_1M, since 4:09:00~4:09:59.999 the price is 155.34, wouldn't the MA_1M be closer to that value instead of the 153.73 value at 4:10:00? – user4446237 Aug 25 '20 at 20:30

1 Answers1

1

The following works, except for the NaNs - I don't know how attached you are to those:

foo  = df.apply(lambda x: df[(df['Timestamp'] <= x['Timestamp']) & (df['timestamp']> x['timestamp'] - pd.Timedelta('5 min'))]['Price'].mean(), axis=1)
Igor Rivin
  • 4,632
  • 2
  • 23
  • 35