2

I am performing a whisker-tracking experiments. I have high-speed videos (500fps) of rats whisking against objects. In each such video I tracked the shape of the rat's snout and whiskers. Since tracking is noisy, the number of whiskers in each frame may be different (see 2 consecutive frames in attached image, notice the yellow false-positive whisker appearing in the left frame but not the right one).

See example 1: enter image description here

As an end result of tracking, I get, for each frame, a varying number of variable-length vectors; each vector corresponding to a single whisker. At this point I would like to match the whiskers between frames. I have tried using Matlab's sample align to do this, but it works only somewhat properly. Its results are attached below (in attached image showing basepoint of all whiskers over 227 frames).

See example 2: enter image description here

I would like to run some algorithm to cluster the whiskers correctly, such that each whisker is recognized as itself and separated from other over the course of many frames. In other words, I would like each slightly sinusoidal trajectory in the second image to be recognized as one trajectory. Whatever sorting algorithm I use should take into account that whiskers may disappear and reappear between consecutive frames. Unfortunately, I'm all out of ideas...

Any help?

Once again, keep in mind that for each point in attached image 2, I have many data points, since this is only a plot of whisker basepoint, while in actuality I have data for the entire whisker length.

phts
  • 3,889
  • 1
  • 19
  • 31
  • After looking at the pattern in image 2, I would suggest you transform the data to a higher dimension, possible 2D or 3D, using, for instance, kernel methods, and perform `k-means` in that higher dimension. To me it seems like a polynomial kernel of degree 2 or 3 should do the trick. And if you know how many whiskers you are looking for, then make that your `k` in `k-means`. If you provide some sample data I can write a working solution for you. Let me know if it makes sense. – Pablo Rivas Jul 16 '15 at 16:16

1 Answers1

0

This is how I would deal with the problem. Assuming that data vectors of different size are in a cell type called dataVectors, and knowing the number of whiskers (nSignals), I would try to extend the data to a second dimension derived from the original data and then perform k-means on two dimensions.

So, first I would get the maximum size of the vectors in order to convert the data to a matrix and do NaN-padding.

maxSize = -Inf;
for k = 1:nSignals
    if length(dataVectors{k}.data) > maxSize
        maxSize = length(dataVectors{k}.data);
    end
end

Now, I would make the data 2D by elevating it to power of two (or three, your choice). This is just a very simple transformation. But you could alternatively use kernel methods here and project each vector against the rest; however, I don't think this is necessary, and if your data is really big, it could be inefficient. For now, raising the data to the power of two should do the trick. The result is stored in a second dimension.

projDegree = 2;
projData = zeros(nSignals, maxSize, 2).*NaN;
for k = 1:nSignals
    vecSize = length(dataVectors{k}.data);
    projData(k, 1:vecSize, 1) = dataVectors{k}.data;
    projData(k, 1:vecSize, 2) = dataVectors{k}.data.*projDegree;
end
projData = reshape(projData, [], 2);

Here, projData will have in row 1 and column 1, the original data of the first whisker (or signal as I call it here), and column 2 will have the new dimension. Let's suppose that you have 8 whiskers in total, then, projData will have the data of the first whisker in row 1, 9, 17, and so on. The data of the second whisker in row 2, 10, 18, and so forth. That is important if you want to work your way back to the original data. Also, you can try with different projDegrees but I doubt it will make a lot of difference.

Now we perform k-means on the 2D data; however, we provide the initial points instead of letting it determine them with k-means++. The initial points, as I propose here, are the first data point of each vector for each whisker. In this manner, k-means will depart from there and will move to clusters means accordingly. We save the results in idxK.

idxK = kmeans(projData,nSignals, 'Start', projData(1:nSignals, :));

And there you have it. The variable idxK will tell you which data point belongs to what cluster.

Below is a working example of my proposed solution. The first part is simply trying to produce data that looks like your data, you can skip it.

rng(9, 'twister')
nSignals = 8;   % number of whiskers
n = 1000;       % number of data points
allData = zeros(nSignals, n);   % all the data will be stored here

% this loop will just generate some data that looks like yours
for k = 1:nSignals
    x = sort(rand(1,n));
    nPeriods = round(rand*9)+1;     % the sin can have between 1-10 periods
    nShiftAmount = round(randn*30);     % shift between ~ -100 to +100
    y = sin(x*2*pi*nPeriods) + (randn(1,n).*0.5);
    y = y + nShiftAmount;
    allData(k, :) = y;
end
nanIdx = round(rand(1, round(n*0.05)*nSignals).*((n*nSignals)-1))+1;   
allData(nanIdx) = NaN;      % about 5% of the data is now missing

figure(1);
for k = 1:nSignals
    nanIdx = ~isnan(allData(k, :));
    dataVectors{k}.data = allData(k, nanIdx);
    plot(dataVectors{k}.data, 'kx'), hold on;
end

enter image description here

% determine the max size 
maxSize = -Inf;
for k = 1:nSignals
    if length(dataVectors{k}.data) > maxSize
        maxSize = length(dataVectors{k}.data);
    end
end

% making the data now into two dimensions and NaN pad
projDegree = 2;
projData = zeros(nSignals, maxSize, 2).*NaN;
for k = 1:nSignals
    vecSize = length(dataVectors{k}.data);
    projData(k, 1:vecSize, 1) = dataVectors{k}.data;
    projData(k, 1:vecSize, 2) = dataVectors{k}.data.*projDegree;
end
projData = reshape(projData, [], 2);
figure(2); plot(projData(:,1), projData(:,2), 'kx');

enter image description here

% run k-means using the first points of all measure as the initial points
idxK = kmeans(projData,nSignals, 'Start', projData(1:nSignals, :));
figure(3);
liColors = [{'yx'},{'mx'},{'cx'},{'bx'},{'kx'},{'gx'},{'rx'},{'gd'}];
for k = 1:nSignals
    plot(projData(idxK==k,1), projData(idxK==k,2), liColors{k}), hold on;
end

enter image description here

% plot results on original data
figure(4);
for k = 1:nSignals
    plot(projData(idxK==k,1), liColors{k}), hold on;
end

enter image description here

Let me know if this helps.

Pablo Rivas
  • 941
  • 9
  • 16