I am trying to create a python program that will count the maximum number of alleles at each locus from a text file I created. Here is a example of my text file I am working with.
Locus1 Locus2 Locus3 Locus4
sample1 102 222 245 111 166 234 111 234
sample2 156 199 111 229 233 289 177 189 227 233 299 303
In this example, I have two samples with genetic data at four loci (my file contains around 500 samples). The genetic data are alleles that occur at that each locus. Each allele is made up of three numbers. For example, in sample1 and locus1 there are three alleles being represented (102, 222, 245). Sample1/Locus2 there are two alleles (111 and 166); sample1/Locus3 there is one allele (234); and sample1/Locus4 there are two alleles (111 and 234).
In the next sample, sample2/Locus1 there are two alleles (156,199); sample2/Locus2 there are four alleles (111, 229, 233,289); sample2/Locus3 there are two alleles (177, 189), and sample2/Locus4 there are four allele (227, 233, 299, 303).
I am trying to create a python program that will find the locus that has the most alleles (maximum number) being expressed at that sample. In sample1, the most alleles being expressed is in Locus1 because it has 3 alleles, while Locus2 and Locus4 only have 2 alleles and Locus3 only has 1 allele. So, my output number should be 3. In sample2, the most alleles being expressed is in both Locus2 and Locus4. At these two loci they have 4 alleles. So my output number should be 4. Ideally, my final output file should be the list of samples with the maximum allele number next to it. For example,
sample1 3
sample2 4
etc....
Also, each locus is separated by 7 tabs, and within each locus the alleles are seperated by a tab.
I apologize for any confusion. I just cannot seem to figure out how to count a certain set of numbers (in multiples of 7 tabs from a text file) along a line and find which set of numbers has the highest set of those numbers. I would appreciate any thoughts.