0

I have a data set looks like this:

'2014-01-07 22:20:00'        [0.0016]
'2014-01-07 22:25:00'        [0.0013]
'2014-01-07 22:30:00'        [0.0017]
'2014-01-07 22:35:00'        [0.0020]
'2014-01-07 22:40:00'        [0.0019]
'2014-01-07 22:45:00'        [0.0022]
'2014-01-07 22:50:00'        [0.0019]
'2014-01-07 22:55:00'        [0.0019]
'2014-01-07 23:00:00'        [0.0021]
'2014-01-07 23:05:00'        [0.0021]
'2014-01-07 23:10:00'        [0.0026]

First column is the time stamp recording data everything 5 min, second column is return.

For each day, I want to calculate sum of squared 5 min bar returns. Here I define a day as from 5:00 pm - 5:00 pm. ( So date 2014-01-07 is from 2014-01-06 17:00 to 2014-01-07 17:00 ). So for each day, I would sum squared returns from 5:00 pm - 5:00 pm. Output will be something like:

'2014-01-07'        [0.046]
'2014-01-08'        [0.033]

How should I do this?

user3666197
  • 1
  • 6
  • 50
  • 92
user3845799
  • 45
  • 1
  • 5
  • do you already got something or do you start from scratch? Also is that dataset read in from a file and in table format? – Finn Jun 15 '16 at 14:28
  • I read the date from excel csv into matlab cell format. first column is string, second is number – user3845799 Jun 15 '16 at 15:15
  • are you certain that you have a vlaue every 5min (no datapoint skipped) and does the data happen to start at 5pm, or could the file be cut to be starting at 5pm? – Finn Jun 15 '16 at 16:03
  • the data is recorded every 5 mins everyday. my data file starts at YYMMDD 00:05 and ends with YYMMDD 00:00. within a day, datapoint could be skipped. – user3845799 Jun 15 '16 at 16:20

2 Answers2

0

I admit that your dates are in a cell and your values in a vector.

So for example you have:

date = {'2014-01-07 16:20:00','2014-01-07 22:25:00','2014-01-08 16:20:00','2014-01-08 22:25:00'};

value = [1 2 3 4]; 

You can find the sum for each date with:

%Creation of an index that separate each "day".

    [~,~,ind]   = unique(floor(cellfun(@datenum,date)+datenum(0,0,0,7,0,0))) %datenum(0,0,0,7,0,0) correspond to the offset

for i = 1:length(unique(ind))
    sumdate(i) = sum(number(ind==i).^2)
end

And you can find the corresponding day of each sum with

datesum  = cellstr(datestr(unique(floor(cellfun(@datenum,date)+datenum(0,0,0,7,0,0)))))
obchardon
  • 10,614
  • 1
  • 17
  • 33
0

Here is alternative solution Just defining some randome data

t1 = datetime('2016-05-31 00:00:00','InputFormat','yyyy-MM-dd HH:mm:ss ');
t2 = datetime('2016-06-05 00:00:00','InputFormat','yyyy-MM-dd HH:mm:ss ');
Samples = 288;        %because your sampling time is 5 mins            
t = (t1:1/Samples:t2).';
X = rand(1,length(t));

First we find the sample which has the given criteria (Can be anything, In your case it was 00:05:00)

 n = find(t.Hour >= 5,1,'first')
    b = n;

Find the total number of days after the given sample

totaldays = length(find(diff(t.Day)))

and square and accumulate the 'return'for each day

for i = 1:totaldays - 1
    sum_acc(i) = sum(X(b:b + (Samples - 1)).^2);
    b = b + Samples;

end

This is just for visualization of the data

Dates = datetime(datestr(bsxfun(@plus ,datenum(t(n)) , 0:totaldays - 2)),'Format','yyyy-MM-dd')
table(Dates,sum_acc.','VariableNames',{'Date' 'Sum'})

   Date        Sum  
__________    ______

2016-05-31    93.898
2016-06-01    90.164
2016-06-02    90.039
2016-06-03    91.676
Novice_Developer
  • 1,432
  • 2
  • 19
  • 33
  • 1
    I modified your code a bit to consider the fact that some 5 min records could be skipped. Your idea works well. Thanks! – user3845799 Jun 15 '16 at 22:01