I have found plenty of information related to moving averages when the data is sampled to regular intervals (ie 1 min, 5 mins, etc). However, I need a solution for a time series dataset that has irregular time intervals.
The dataset contains two columns, Timestamp and Price. Timestamp goes down to the millisecond, and there is no set interval for rows. I need to take my dataframe and add three moving average columns:
1 min 5 min 10 min
I do not want to resample the data, I want the end result to be the same number of rows but with the three columns filled as applicable. (IE, NaN until the 1/5/10 min interval for each column, respectively)
I feel like I am getting close, but cannot figure out how to pass the moving average variable to this function:
import pandas as pd
import numpy as np
# Load IBM data from CSV
df = pd.read_csv(
"C:/Documents/Python Scripts/MA.csv", names=['Timestamp',
'Price'])
# Create three moving average signals
df['Timestamp'] = pd.to_datetime(df['Timestamp'], errors='coerce')
df.set_index('Timestamp', inplace=True)
def movingaverage(values, window):
weights = np.repeat(1.0, window)/window
smas = np.convolve(values, weights, 'valid')
return smas
MA_1M = movingaverage(df, 1)
MA_5M = movingaverage(df, 5)
MA_10M = movingaverage(df, 10)
print(MA_1M)
Example Data:
Timestamp Price
2018-10-08 04:00:00.013 152.59
2018-10-08 04:00:00.223 156.34
2018-10-08 04:01:00.000 152.73
2018-10-08 04:05:00.127 156.34
2018-10-08 04:10:00.000 152.73
Expected Output:
Timestamp Price MA_1M MA_5M MA10M
2018-10-08 04:00:00.013 152.59 N/A N/A N/A
2018-10-08 04:00:00.223 156.34 N/A N/A N/A
2018-10-08 04:01:00.000 154.73 154.55 N/A N/A
2018-10-08 04:05:00.127 155.34 155.34 155.47 N/A
2018-10-08 04:10:00.000 153.73 153.73 154.54 154.55
At each row, the MA column takes that timestamp and looks back 1, 5 or 10 minutes and calculates the average. The thing that makes this difficult is that the rows can be generated at any millisecond. In my code above I am simply trying to get a moving average to work with a time variable. I am assuming that as long as the row counts match I can use the logic to add a column to my df.