My answer is quite similar to lakesh's one. But I will think your problem in terms of interpolation.
First of all, a moving average, or a time average of a function, is the integral of it over a time period, divided by the time length.
In your case, the integral can be seen as a sum, since most generally in each minute the function value is the same. However, your data has unequal time intervals. This can be seen as missing points of the function. Let me explain: for each minute x
, you should have a price f(x)
. But for some times say x=5
, f(x)
is undefined.
One of the ways you can get rid of discontinuities of a function is interpolation - assign some value to the missing points, according to some rules of calculation. The simpliest algorithm is "keeping the previous value", which is essentially lakesh's idea.
But the benefit of thinking in this aspect lies in the ability to make your data more accurate. It may not apply to a stock market case, but should be true generally, such as a temperature measuring or wind speed, which is guaranteed to smoothly change over the time (rather than keeping constant for 2 minutes and suddenly change in one second). You can use different interpolation techniques to polish the data. "Polishing" in this sense is ok because in any way you have to use the concept of "average". A good interpolation should make the data closer to a model that has been proven to work with the real problem.
CODE - I set the max interval to 5 minutes to show huge difference between the two methods. It depends on your observation and experience to decide which (or any other) method is the best to "predict the past".
% reproduce your scenario
N = 20;
max_interval = 5;
time = randi(max_interval,N,1);
time(1) = 1; % first minute
price = randi(10,N,1);
figure(1)
plot(cumsum(time), price, 'ko-', 'LineWidth', 2);
hold on
% "keeping-previous-value" interpolation
interp1 = zeros(sum(time),1)-1;
interp1(cumsum(time)) = price;
while ismember(-1, interp1)
interp1(interp1==-1) = interp1(find(interp1==-1)-1);
end
plot(interp1, 'bx--')
% "midpoint" interpolation
interp2 = zeros(sum(time),1)-1;
interp2(cumsum(time)) = price;
for ii = 1:length(interp2)
if interp2(ii) == -1
t1 = interp2(ii-1);
t2 = interp2( find(interp2(ii:end)>-1, 1, 'first') +ii-1);
interp2(ii) = (t1+t2)/2;
end
end
plot(interp2, 'rd--')
% "modified-midpoint" interpolation
interp3 = zeros(sum(time),1)-1;
interp3(cumsum(time)) = price;
for ii = 1:length(interp3)
if interp3(ii) == -1
t1 = interp3(ii-1);
t2 = interp3( find(interp3(ii:end)>-1, 1, 'first') +ii-1);
alpha = 1 / find(interp3(ii:end)>-1, 1, 'first');
interp3(ii) = (1-alpha)*t1 + alpha*t2;
end
end
plot(interp3, 'm^--')
hold off
legend('original data', 'interp 1', 'interp 2', 'interp 3')
fprintf(['"keeping-previous-value" (weighted sum) \n', ...
' result: %2.4f \n'], mean(interp1));
fprintf(['"midpoint" (linear interpolation) \n', ...
' result: %2.4f \n'], mean(interp2));
fprintf(['"modified-midpoint" (linear interpolation) \n', ...
' result: %2.4f \n'], mean(interp3));
Note: undefined points should be presented by NaN
, but -1
seems easier to play with.