0

I have a CSV file 'XPQ12.csv' of futures tick data in the following form:

20090312    30:14.0 717.25  1   E
20090312    30:15.0 718.47  1   E
20090312    30:17.0 717.25  1   E
20090312    30:32.0 718.42  1   E
20090312    30:49.0 715.32  1   E
20090312    30:58.0 717.57  1   E
20090312    31:06.0 716.65  3   E
20090312    31:12.0 718.35  2   E
20090312    31:45.0 721.14  1   E
20090312    31:52.0 719.24  1   E
20090312    32:11.0 717.02  6   E
20090312    32:29.0 717.14  1   E
20090312    32:35.0 717.34  1   E
20090312    32:55.0 717.26  1   E

(The first column is the yearmonthdate, the second column is the minute:second:tenthofsecond, the third column is the price, the fourth column is the number of contracts traded, and the fifth indicates if the trade was electronic or in a pit). In my actual data set, I may have thousands of price quotes within any given minute.

I read the file using the following code:

fid = fopen('C:\Program Files\MATLAB\R2013a\XPQ12.csv','r'); 
[c] = fscanf(fid, '%d,%d:%d.%d,%f,%d,%c')

Which outputs:

20090312
      30
      14
       0
  717.25
       1
      69
20090312
      30
      15
       0
  718.47
       3
      69
       .
       .
       .

(the 69s are the matlab representation for E I believe)

Now I want to cut this up into one minute ohlc bars, so that for each minute, I record what the first, highest, lowest, and last price was within that minute. I'd really like to know the best way to go about this.

My original idea was to store the sequence of minutes in a vector d, and while working through the data, each time the number at the end of d changed I would record the corresponding price as an open, record the previous price as a close for the last bar, and find the largest and smallest prices within each open and close.

c(2) is the first minute, so I said:

d(1)=c(2);

and then noting that I'd always be counting by 7 before getting to the next minute, I said:

Nrows = numel(textread('XPQ12.csv','%1c%*[^\n]')); % counts rows in file
for i=1:Nrows
 if mod(i-2,7)== 0; 
     d(end+1)=c(i);
 end
end

which should fill up d with all the minutes:

30
30
30
30
30
30
31
31
31
31
32
32
32
32

in the case of the example data. I'm kind of lost what to do from here, or if what I'm doing is on the right track.

siegel
  • 819
  • 2
  • 12
  • 24
  • Sorry it wasn't clear, I'm editing it some more. I would like to know the best way to find the open high low and close of the prices within each minute. – siegel Jun 14 '13 at 15:30
  • Do you have the financial Toolbox? – Oleg Jun 14 '13 at 15:39
  • I do have it, and any others I might need. – siegel Jun 14 '13 at 15:42
  • There is a function that does that: [`highlow()`](http://www.mathworks.co.uk/help/finance/highlow.html), or [`candle()`](http://www.mathworks.co.uk/help/finance/candle.html). Does that sole your issue? – Oleg Jun 14 '13 at 15:46
  • according to the documentation `highlow()` generates a plot of an object that already has the 4 data series of open high low close prices. `highlow()` will be useful after I cut up the continuous stream of tick data into open high low and close prices for each minute, but it's this process of cutting up the tick data into ohlc minute bars that I am having an issue with. – siegel Jun 14 '13 at 15:55
  • Use `texscan()` to import the data, then you can use the `accumarray()` as I showed [here](http://stackoverflow.com/questions/17032033/matlab-averaging-time-series-data-without-loops/17032630#17032630) to find the min and max per day. – Oleg Jun 14 '13 at 16:02
  • Was going to just make a comment but then ended up coding... – Hugh Nolan Jun 14 '13 at 16:28

1 Answers1

1

From where you are:

Minutes = c(2:7:end);    
MinuteValues=unique(Minutes);
Prices = c(5:7:end);
if (length(Prices)>length(Minutes))
    Prices=Prices(1:length(Minutes));
elseif (length(Prices)<length(Minutes))
    Minutes=Minutes(1:length(Prices));

OverflowValues=1+find(Minutes(2:end)==0 & Minutes(1:end-1)==59);
for v=length(OverflowValues):-1:1
    Minutes(OverflowValues(v):end)=Minutes(OverflowValues(v):end)+60;
end

Highs=zeros(1,length(MinuteValues));
Lows=zeros(1,length(MinuteValues));
First=zeros(1,length(MinuteValues));
Last=zeros(1,length(MinuteValues));
for v=1:length(MinuteValues)
    Highs(v) = max(Prices(Minutes==MinuteValues(v)));
    Lows(v) = min(Prices(Minutes==MinuteValues(v)));
    First(v) = Prices(find(Minutes==MinuteVales(v),1,'first'));
    Last(v) = Prices(find(Minutes==MinuteVales(v),1,'last'));
end

Using textread would make this easier for you, as mentioned.

(If you are lost at this stage, I wouldn't find accumarray as mentioned in the comments is the best place to start!)

By the way, this is assuming that minutes increases above 60 and you don't have hours in there somewhere. Otherwise this won't work at all.

Hugh Nolan
  • 2,508
  • 13
  • 15
  • Thanks for the code! At 60 minutes the times look like this: 59:58.0 59:59.0 00:01.0 00:02.0 – siegel Jun 14 '13 at 18:16
  • I've added a small update to convert that to 59:58.0 59:59.0 60:01.0 60:02.0 and so on so that you have unique values for minutes. – Hugh Nolan Jun 15 '13 at 12:43