python program that counts the maximum number of alleles at each locus

Question

I am trying to create a python program that will count the maximum number of alleles at each locus from a text file I created. Here is a example of my text file I am working with.

          Locus1           Locus2          Locus3           Locus4

sample1   102 222 245      111 166          234              111 234   

sample2   156 199          111 229 233 289  177 189          227 233 299 303

In this example, I have two samples with genetic data at four loci (my file contains around 500 samples). The genetic data are alleles that occur at that each locus. Each allele is made up of three numbers. For example, in sample1 and locus1 there are three alleles being represented (102, 222, 245). Sample1/Locus2 there are two alleles (111 and 166); sample1/Locus3 there is one allele (234); and sample1/Locus4 there are two alleles (111 and 234).

In the next sample, sample2/Locus1 there are two alleles (156,199); sample2/Locus2 there are four alleles (111, 229, 233,289); sample2/Locus3 there are two alleles (177, 189), and sample2/Locus4 there are four allele (227, 233, 299, 303).

I am trying to create a python program that will find the locus that has the most alleles (maximum number) being expressed at that sample. In sample1, the most alleles being expressed is in Locus1 because it has 3 alleles, while Locus2 and Locus4 only have 2 alleles and Locus3 only has 1 allele. So, my output number should be 3. In sample2, the most alleles being expressed is in both Locus2 and Locus4. At these two loci they have 4 alleles. So my output number should be 4. Ideally, my final output file should be the list of samples with the maximum allele number next to it. For example,

sample1 3

sample2 4

etc....

Also, each locus is separated by 7 tabs, and within each locus the alleles are seperated by a tab.

I apologize for any confusion. I just cannot seem to figure out how to count a certain set of numbers (in multiples of 7 tabs from a text file) along a line and find which set of numbers has the highest set of those numbers. I would appreciate any thoughts.

What code do you have so far? Have a look at stackoverflow.com/help/mcve — miltonb, Nov 30 '15 at 03:16
Possible duplicate of [counting sets of numbers in a long list](http://stackoverflow.com/questions/34012713/counting-sets-of-numbers-in-a-long-list) — manlio, Dec 12 '15 at 07:45

python program that counts the maximum number of alleles at each locus

0 Answers0