I'm trying to classify a testset using GMM. I have a trainset (n*4 matrix) with labels {1,2,3}, n means the number of training examples, which have 4 properties. And I also have a testset (m*4) to be classified.
My goal is to have a probability matrix (m*3) for each testing example giving each label P(x_test|labels)
. Just like soft clustering.
first, I create a GMM with k=9 components over the whole trainset. I know in some papers, the author create a GMM for each label in trainset. But I want to deal with the data from all of the classes.
GMModel = fitgmdist(trainset,k_component,'RegularizationValue',0.1,'Start','plus');
My problem is, I want to confirm the relationship P(component|labels)
between components and labels. So I write a code as below, but not sure if it's right,
idx_ex_of_c1 = find(trainset_label==1);
idx_ex_of_c2 = find(trainset_label==2);
idx_ex_of_c3 = find(trainset_label==3);
[~,~,post] = cluster(GMModel,trainset);
cita_c_k = zeros(3,k_component);
for id_k = 1:k_component
cita_c_k(1,id_k) = sum(post(idx_ex_of_c1,id_k))/numel(idx_ex_of_c1);
cita_c_k(2,id_k) = sum(post(idx_ex_of_c2,id_k))/numel(idx_ex_of_c2);
cita_c_k(3,id_k) = sum(post(idx_ex_of_c3,id_k))/numel(idx_ex_of_c3);
end
cita_c_k
is a (3*9) matrix to store the relationships. idx_ex_of_c1
is the index of examples, whose label is '1' in the trainset.
For the testing process. I first apply the GMModel to testset
[P,~] = posterior(GMModel,testset); % P is a m*9 matrix
And then, sum all components,
P_testset = P*cita_c_k';
[a,b] = max(P_testset,3);
imagesc(b);
The result is ok, But not good enough. Can anyone give me some tips?
Thanks!