-1

I was learning python coding and was using a function for calculating the gc percentage in a DNA sequence with undefined character N or n (NAAATTTGGGCCCN) and this created the following problem. is there a way to overcome this ?

def gc(sequence) :
    "This function computes the GC percentage of a dna sequence"
    nbases=sequence.count('n')+sequence.count('N')
    gc_count=sequence.count('c')+sequence.count('C')+sequence.count('g')+sequence.count('G')      #total gc count
    gc_percent=float(gc_count)/(len(sequence-nbases))     # TOTAL GC COUNT DIVIDED BY TOTAL LEN OF THE sequence-TOTAL NO. OF N
    return 100 * gc_percent
jasonharper
  • 9,450
  • 2
  • 18
  • 42
  • 3
    What is the following problem? Be more clear with your problem statement – taha Jun 16 '20 at 18:16
  • 4
    What problem, exactly? If you received an error message, we need to see the full traceback. If you received an unexpected result, we need to see that result, and what you expected. – jasonharper Jun 16 '20 at 18:16
  • 3
    Oh, I see it now - `len(sequence-nbases)` is trying to subtract a number from a string, you want `len(sequence) - nbases` instead. – jasonharper Jun 16 '20 at 18:23

2 Answers2

1

As @jasonharper said in the comments, you need to close the len() function. So change len(sequence-nbases) to len(sequence)-nbases.

len(sequence)-nbases
Axe319
  • 4,255
  • 3
  • 15
  • 31
E. Goldsmi
  • 25
  • 6
0
def GC_content(dnaseq):
    percent = round(((dnaseq.count("C") + dnaseq.count("G")) / len(dnaseq)) * 100, 3)
    print(f'GC content: {percent} %')

Here is a code I had laying around for the same thing. But I had mine round to 3 decimal places just for consistency in my program. And I would just put something like sequence.upper() so you avoid having to hard code lower and upper-case letters.