This is how I would deal with the problem. Assuming that data vectors of different size are in a cell
type called dataVectors
, and knowing the number of whiskers (nSignals
), I would try to extend the data to a second dimension derived from the original data and then perform k-means on two dimensions.
So, first I would get the maximum size of the vectors in order to convert the data to a matrix and do NaN
-padding.
maxSize = -Inf;
for k = 1:nSignals
if length(dataVectors{k}.data) > maxSize
maxSize = length(dataVectors{k}.data);
end
end
Now, I would make the data 2D by elevating it to power of two (or three, your choice). This is just a very simple transformation. But you could alternatively use kernel methods here and project each vector against the rest; however, I don't think this is necessary, and if your data is really big, it could be inefficient. For now, raising the data to the power of two should do the trick. The result is stored in a second dimension.
projDegree = 2;
projData = zeros(nSignals, maxSize, 2).*NaN;
for k = 1:nSignals
vecSize = length(dataVectors{k}.data);
projData(k, 1:vecSize, 1) = dataVectors{k}.data;
projData(k, 1:vecSize, 2) = dataVectors{k}.data.*projDegree;
end
projData = reshape(projData, [], 2);
Here, projData
will have in row 1
and column 1
, the original data of the first whisker (or signal as I call it here), and column 2
will have the new dimension. Let's suppose that you have 8
whiskers in total, then, projData
will have the data of the first whisker in row 1
, 9
, 17
, and so on. The data of the second whisker in row 2
, 10
, 18
, and so forth. That is important if you want to work your way back to the original data. Also, you can try with different projDegree
s but I doubt it will make a lot of difference.
Now we perform k-means on the 2D data; however, we provide the initial points instead of letting it determine them with k-means++. The initial points, as I propose here, are the first data point of each vector for each whisker. In this manner, k-means will depart from there and will move to clusters means accordingly. We save the results in idxK
.
idxK = kmeans(projData,nSignals, 'Start', projData(1:nSignals, :));
And there you have it. The variable idxK
will tell you which data point belongs to what cluster.
Below is a working example of my proposed solution. The first part is simply trying to produce data that looks like your data, you can skip it.
rng(9, 'twister')
nSignals = 8; % number of whiskers
n = 1000; % number of data points
allData = zeros(nSignals, n); % all the data will be stored here
% this loop will just generate some data that looks like yours
for k = 1:nSignals
x = sort(rand(1,n));
nPeriods = round(rand*9)+1; % the sin can have between 1-10 periods
nShiftAmount = round(randn*30); % shift between ~ -100 to +100
y = sin(x*2*pi*nPeriods) + (randn(1,n).*0.5);
y = y + nShiftAmount;
allData(k, :) = y;
end
nanIdx = round(rand(1, round(n*0.05)*nSignals).*((n*nSignals)-1))+1;
allData(nanIdx) = NaN; % about 5% of the data is now missing
figure(1);
for k = 1:nSignals
nanIdx = ~isnan(allData(k, :));
dataVectors{k}.data = allData(k, nanIdx);
plot(dataVectors{k}.data, 'kx'), hold on;
end

% determine the max size
maxSize = -Inf;
for k = 1:nSignals
if length(dataVectors{k}.data) > maxSize
maxSize = length(dataVectors{k}.data);
end
end
% making the data now into two dimensions and NaN pad
projDegree = 2;
projData = zeros(nSignals, maxSize, 2).*NaN;
for k = 1:nSignals
vecSize = length(dataVectors{k}.data);
projData(k, 1:vecSize, 1) = dataVectors{k}.data;
projData(k, 1:vecSize, 2) = dataVectors{k}.data.*projDegree;
end
projData = reshape(projData, [], 2);
figure(2); plot(projData(:,1), projData(:,2), 'kx');

% run k-means using the first points of all measure as the initial points
idxK = kmeans(projData,nSignals, 'Start', projData(1:nSignals, :));
figure(3);
liColors = [{'yx'},{'mx'},{'cx'},{'bx'},{'kx'},{'gx'},{'rx'},{'gd'}];
for k = 1:nSignals
plot(projData(idxK==k,1), projData(idxK==k,2), liColors{k}), hold on;
end

% plot results on original data
figure(4);
for k = 1:nSignals
plot(projData(idxK==k,1), liColors{k}), hold on;
end

Let me know if this helps.