1

I have a matrix with one column containing data (one sample per second) and another column with the timestamp in seconds. There is some seconds where the data doesn't change from the last one, and because of this doesn't appear on the vector. I wanted to apply a function, such as a simple mean, to time intervals (30 secs, for example). But for this I have to count with the missing seconds. What's the best way to do this?

  1. Create first a matrix with the repeated elements (I would also like to have the correct timestamps for the missing seconds included - the hardest part) and only then calculate means;

or

  1. Use a cycle (the worst way, I suppose) to calculate mean while inserting missing samples;

Thanks in advance!

ps.: OR is it possible to apply functions to the data identifying and automatically introducing (by repeating) the missing data?

XanderW
  • 13
  • 3
  • Let me get it right: when you plot points with the same data and timestamp values, they overlap. How does obtaining the mean values for then? – Eitan T Sep 02 '13 at 14:40

2 Answers2

2

You can use a combination of diff and sum to include the 'missing' entries via weighted averaging:

% time step
step = 1;

% Example data (with repeated elements)
A = [...
     1 80.6
     2 79.8
     3 40.3
     4 40.3
     5 81.9
     6 83.6
     7 83.7
     8 95.4
     9 14.8
    10 14.8
    11 14.8
    12 14.8
    13 14.8
    14 44.3];

% Example data, with the repeated elements removed
B = [...
     1 80.6
     2 79.8
     3 40.3     
     5 81.9
     6 83.6
     7 83.7
     8 95.4
     9 14.8    
    14 44.3];

% The correct value
M = mean(A(:,2))

% The correct value for the pruned data
D = diff(B(:,1));
W = [D/step; 1]; 
M = sum( W .* B(:,2))/sum(W)

Results:

M1 =
    5.027857142857141e+001
M2 =
    5.027857142857143e+001

Alternatively, you can re-create the full vector A from the abbreviated B via run length encoding. You can do this efficiently like so:

W = [diff(B(:,1))/step; 1];
idx([cumsum([true; W(W>0)])]) = true;

A_new = [ (B(1,1):step:B(end,1)).'  B(cumsum(idx(1:find(idx,1,'last')-1)),2) ];
Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
  • 1
    Nice trick to compute a weighted average. The only small error I see is that you weigh the last item with value 1, which works in your example, but not in the general case. You should probably replace it with the minimum time step. – Bas Swinckels Sep 02 '13 at 15:29
  • @BasSwinckels: This, of course, is true. I'll edit it in, thanks! – Rody Oldenhuis Sep 02 '13 at 15:58
  • This works supposing we will have the minimum timestep happening in the data, right? if not, declaring a variable with the sampling interval should work, right? – XanderW Sep 03 '13 at 14:12
  • @XanderW: yes, of course. As long as the sample times are evenly spaced with some omissions, my method will work (perhaps with some surgery here and there) – Rody Oldenhuis Sep 03 '13 at 14:20
  • one more thing... what If I want to apply some other kind of functions, such as weighted moving average or use the "smoothts" function? – XanderW Sep 03 '13 at 14:28
  • @XanderW: hmmm...sounds like you'd better ask a new question :) – Rody Oldenhuis Sep 03 '13 at 14:31
  • Ok, I'll leave that to another question. But still related to this one, if I want to create a vector with the missing samples and respective timestamp, how could I do it? I'm asking, because I also need to calculate the mean from certain time intervals. – XanderW Sep 03 '13 at 14:47
1

You can give each sample a weight that reflects the number of samples it actually represents. Such weight can be computed with diff:

data = [1 1; 0 2; 3 5; 4 7]; % Example data. Second column is timestamp

weights = diff([data(:,2); data(end,2)+1]); % We assume the last sample
% only represents itself
result = sum(data(:,1).*weights)/sum(weights);
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147