0

I'm trying to take DNA sequences from an input file and count the number of individual A's T's C's and G's using a loop to count them and if there is a non "ATCG" letter I need to print "error" For example my input file is:

Seq1 AAAGCGT Seq2 aa tGcGt t Seq3 af GtgA cCTg

The code I've come up with is:

acount = 0
ccount = 0
gcount = 0
tcount = 0
for line in input:
         line=line.strip('\n')
         if line[0] == ">":
                print line + "\n"
                output.write(line+"\n")
         else:
                line=line.upper()
                list=line.split()
                for list in line:

                        if list == "A":
                                acount = acount +
                                #print acount
                        elif list == "C":
                                ccount = ccount +
                                #print ccount 

                        elif list == "T":
                                tcount = tcount +
                                #print tcount 
                        elif list == "G":
                                gcount=gcount +1
                                #print gcount 
                        elif list != 'A'or 'T' or 'G' or 'C':
                                break

So I need to have the totals for each line but my code is giving me the total of A's T's etc for the whole file. I want my output to be something like

Seq1: Total A's: 3 Total C's: and so forth for each sequence.

Any ideas on what I can do to fix my code to accomplish that?

1 Answers1

0

I would suggest something along these lines:

import re

def countNucleotides(filePath):
    aCount = []
    gCount = []
    cCount = []
    tCount = []
    with open(filePath, 'rb') as data:
        for line in data:
            if not re.match(r'[agctAGCT]+',line):
                break
            aCount.append(notCount(line,'a'))
            gCount.append(notCount(line,'g'))
            cCount.append(notCount(line,'c'))
            tCount.append(notCount(line,'t'))

def notCount(line, character):
    appearances = 0
    for item in line:
        if item == character:
            appearances += 1
    return appearances

You can print them however you'd like after that.

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
  • I like what you've got here @Slater Tyranus the only problem is (if you couldn't tell) it's for an assignment in school and I get docked points if I use the .count function. – user2097877 Apr 01 '13 at 04:40
  • Please use the homework tag if the problem is homework. Stack Overflow isn't really for homework, but I'll update the question to not use the count function. – Slater Victoroff Apr 01 '13 at 04:45