0

Is there a strategy to cluster shared attributes within a group across a condition, knowing that the condition should inspire a difference between two groups?

A concrete example: say there are 4 individuals in Group A and likewise 4 in Group B. Group A is introduced to StackOverflow and the others are left with nothing but their iron will. 30 000 genes are examined for each individual. We expect that Group A individuals should be relatively stress-free compared to Group B. Thus, we look for clusters of genes that may be highly expressed in Group B but which is lowly expressed in Group A. Identifying this cluster of genes is useful, because these genes may explain the biological response to stress.

But as it turns out the two groups are not linearly separable - the PCA shows a great deal of variance within Group A and Group B. Some individuals in Group A cluster with Group B. There are some genes that are upregulated in Group B, but one or two individuals in Group A also bear this upregulation. Is there a strategy to find that cluster in which the attribute is uniformly shared by all individuals in Group A, and which is different from all individuals in Group B, knowing that the two groups should be different.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
batlike
  • 668
  • 1
  • 7
  • 19

1 Answers1

2

This is not cluster analysis.

You have two classes: treatment and control

And you want to identify those features (genes) that help discriminate those two classes.

Look for supervised feature selection methods such as information gain; and study interpretable classifiers such as decision trees and random forests that will help you identifying the most discriminative genes.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194