2

I'm trying to calculate the GC content (in %) of a DNA sequence for a Rosalind question. I have the following code, but it returns 0, or only the number of G's alone or C's alone (no percentage).

x = raw_input("Sequence?:").upper()
total = len(x)
c = x.count("C")
g = x.count("G")

gc_total = g+c

gc_content = gc_total/total

print gc_content

I also tried this, just to get a count of G's and C's, and not the percentage, but it just returns a count of the entire string:

x = raw_input("Sequence?:").upper()
def gc(n):
    count = 0
    for i in n:
        if i == "C" or "G":
            count = count + 1
        else:
            count = count
    return count
gc(x)

EDIT: I fixed the typo in the print statement in the first example of code. That wasn't the problem, I just pasted the wrong snippet of code (there were many attempts...)

jstewartmitchel
  • 171
  • 3
  • 3
  • 11
  • The first one might be a typo, but you said 'cg_content' instead of 'gc_content'. There is no need for the else statement in the second example. – squiguy Jun 04 '13 at 01:40
  • I fixed it in an edit. That wasn't the root of the problem, I just pasted the wrong block of code from my many, many attempts trying different things. – jstewartmitchel Jun 04 '13 at 01:55

7 Answers7

5

Your problem is that you are performming integer division, not floating point division.

Try

gc_content = gc_total / float(total)
Owen
  • 1,726
  • 10
  • 15
1

Shouldn't:

print cg_content

read

print gc_content?

As for the other snippet of code, your loop says

if i == "C" or "G":

This is evaluating "G" to true every time and thus running the if statement as true.

Instead, it should read

if i == "C" or i=="G":

Also, you don't need that else statement.

Hope this helps. Let us know how it goes.

Abdul Sattar

ASattar
  • 82
  • 1
  • 7
  • Yes that worked! My if statement was off. As fast as the typo in the print statement, that was a result of me scrolling through all the various iterations of the code above to paste an example to show you guys. Thank you so much! – jstewartmitchel Jun 04 '13 at 01:49
0

You also need to multiply the answer by 100 to convert it to a percentage.

NickB
  • 1,471
  • 4
  • 14
  • 20
0
#This works for me.

import sys

filename=sys.argv[1]

fh=open(filename,'r')

file=fh.read()
x=file
c=0
a=0
g=0
t=0

for x in file:
    if "C" in x:
        c+=1    
    elif "G" in x:
        g+=1
    elif "A" in x:
        a+=1    
    elif "T" in x:
        t+=1

print "C=%d, G=%d, A=%d, T=%d" %(c,g,a,t)

gc_content=(g+c)*100/(a+t+g+c)

print "gc_content= %f" %(gc_content)
0
import sys
orignfile = sys.argv[1]
outfile = sys.argv[2]

sequence = ""
with open(orignfile, 'r') as f:
    for line in f:
        if line.startswith('>'):
            seq_id = line.rstrip()[0:]
        else:
            sequence += line.rstrip()
GC_content = float((sequence.count('G') + sequence.count('C'))) / len(sequence) * 100
with open(outfile, 'a') as file_out:
    file_out.write("The GC content of '%s' is\t %.2f%%" % (seq_id, GC_content))
chtz
  • 17,329
  • 4
  • 26
  • 56
0

Maybe too late but it is better using Bio

#!/usr/bin/env python

import sys
from Bio import SeqIO

filename=sys.argv[1]

fh= open(filename,'r')

parser = SeqIO.parse(fh, "fasta")

for record in parser:
    c=0
    a=0
    g=0
    t=0
    for x in str(record.seq):
        if "C" in x:
            c+=1    
        elif "G" in x:
            g+=1
        elif "A" in x:
            a+=1    
        elif "T" in x:
            t+=1
gc_content=(g+c)*100/(a+t+g+c)

print "%s\t%.2f" % (filename, gc_content)
F.Lira
  • 663
  • 2
  • 6
  • 19
0

This may be helpful

import random
dna=''.join(random.choice('ATGCN') for i in range(2048))
print(dna)
print("A count",round((dna.count("A")/2048)*100),"%")
print("T count",round((dna.count("T")/2048)*100),"%")
print("G count",round((dna.count("G")/2048)*100),"%")
print("C count",round((dna.count("C")/2048)*100),"%")
print("AT count",round((dna.count("AT")/2048)*100),"%")
print("GC count",round((dna.count("GC")/2048)*100),"%")
Mate Mrše
  • 7,997
  • 10
  • 40
  • 77