0

I have a set of data points in a vector. For example,

   [NaN, NaN, NaN, -1.5363, NaN -1.7664, -1.7475];

These data result from a code which selects 3 points within a specified range (specifically. -0.6 an 0.6). If three points from the column do not exist in this range, the range is incrementally expanded until three points are found. In the above example, the range was increased to -1.8 to 1.8. However, the data we are analyzing is erratic, and has random peaks and troughs, leading to points which are non-contiguous being accepted into the range (element 3 is chosen to be valid, but not element 4).

What would be the best way to go about this? I already have a code to incrementally increase the range to find three points, I just need to modify it to not stop at any three points, but to increase the range until it finds three CONTIGUOUS points. If that were done for the above example, I would just evaluate slopes to remove the 3rd element (since between 3 and 4, the slope is negative).

Thanks.

TheMcCleaver
  • 75
  • 1
  • 3

1 Answers1

0

Assuming your data as provided in the example is in the variable x, you can use isnan and findstr like so:

x = [NaN, NaN, NaN, -1.5363, NaN -1.7664, -1.7475, 123];
~isnan(x)

ans =

 0     0     0     1     0     1     1     1

pos = findstr(~isnan(x), [1 1 1]);

The reason for using findstr like this is that we would like to find the sequence [1 1 1] within the logical array returned by isnan, and findstr will return the index of the positions in the input array where this sequence appears.

For your example data, this will return [], but if you change it to the data in the example I have given, it will return 6, and you can extract the contiguous region with x(pos:pos+2). You will have to be a bit careful about cases where there are more than 3 contiguous values (if there were 4, it would return [6 7]) and the cases where there is more than one contiguous region. If you don't need to do anything meaningful with these cases then just use pos(1).

If you want to extract the entirety of the first contiguous region whose length is greater than or equal to 3, you could do something like:

x = [NaN, NaN, NaN, -1.5363, NaN -1.7664, -1.7475, 123, 456, 789];

startPos = [];
stopPos = [];

pos = findstr(~isnan(x), [1 1 1]);
if ~isempty(pos)
    startPos = pos(1);
    stopPos = startPos + 2;

    % Find any cases where we have consecutive numbers in pos
    if length(pos) > 1 && any(diff(pos) == 1)
        % We have a contiguous section longer than 3 elements

        % Find the NaNs
        nans = find(isnan(x));
        % Find the first NaN after pos(1), or the index of the last element
        stopPos = nans(nans > startPos);
        if ~isempty(stopPos)
            stopPos = stopPos(1) - 1; % Don't want the NaN
        else
            stopPos = length(x);
        end
    end
end

x(startPos:stopPos)
wakjah
  • 4,541
  • 1
  • 18
  • 23
  • Here is an image of my loop used to increase validity boundaries for the columns: http://imgur.com/6i9Flt9 (it was long and wouldn't format well in the comments). Also, y_A is the array of points I am evaluating, and step_A is a vector showing the required number of increments for each column to give 3 points. I'm not sure I understand your implementation of findstr. Why would it return 6 in your case And more than 3 contiguous points is fine, these are for linear regression. – TheMcCleaver Apr 02 '13 at 14:07
  • Answer edited to include more explanation and a slightly fuller solution. – wakjah Apr 02 '13 at 14:35