1

My work is in genetics and I'm using the Hamming distance (in Matlab) to calculate the genetic distance between genotypes of a virus.

For example: Type 1 has structure 01234 and Type 2 has structure 21304 etc. Obviously there are many genotypes present. Because the genotypes have the same length, I thought using the Hamming distance would be fine.

My question is this: How can I order the genotypes based on the Hamming distance. Another way of putting this: how can I sort the genotypes into clusters based on the Hamming distance between them?

Thanks

1 Answers1

0

You can use severel methodes to cluster such data. Based on the distance matrix you can use UPGMA or neighbor joining

Single linkage or complete linkage are also distance based cluster methodes.

Thargor
  • 1,862
  • 14
  • 24
  • Thanks for the reply. The distance matrix that I use gives the distance of each genotype, relative to each other genotype. For example, if there are 10 genotypes, my matrix would be 10x10, with zeros on the main diagonal (since the distance between a genotype and itself is 0). – Thinus Viljoen Feb 20 '12 at 14:29
  • Thats the way distance matrices always looks like =) But how to implement these clustermethodes in matlab i dont know. I never worked with matlab. But i think Google can help or somebody has implement them always. – Thargor Feb 20 '12 at 14:36